Fetch a file

Following interfaces provide functionality to perform get file operation:

DFS Client API

Get file operation through DFS Client API uses java.io and Hadoop classes. IOUtils.copy(InputStream, OutputStream) is used to get the file from server to local system.

import java.io.File;
import java.io.FileInputStream;
import java.io.FileOutputStream;
import java.io.IOException;
import java.io.InputStream;
import java.io.OutputStream;

import org.apache.commons.io.IOUtils;
import org.apache.hadoop.conf.Configuration;
import org.apache.hadoop.fs.FileSystem;
import org.apache.hadoop.fs.Path;
import org.apache.hadoop.hdfs.DFSConfigKeys;

public class GetObject {
	/*
	 * This program reads a file from HDFS and saves it on the local file system.
	 */
	public static void main(String[] args) throws IOException{
		Configuration conf = new Configuration(true);	//Create a configuration object to define hdfs properties
		conf.set(DFSConfigKeys.FS_DEFAULT_NAME_KEY, "hdfs://192.168.0.1:9000"); // URL for your namenode
		conf.set(DFSConfigKeys.DFS_REPLICATION_KEY, "3"); // Replication count for files you write
		
		OutputStream os = null;
		InputStream is = null;
		try{
			//Initialize DFS FileSystem object with QueryIO configurations 
			FileSystem dfs = FileSystem.get(conf);	
			dfs.mkdirs(new Path("/queryio/demo/"));	//creates new directory if it doesn't exist
			
			is = dfs.open(new Path("/queryio/demo/file1.txt"));	//InputStream to the object to be GET
			
			os = new FileOutputStream(new File("/local/queryio.txt"));	//OutputStream to a local filesystem file
			
			IOUtils.copy(is, os);	//copy bytes from InputStream to OutputStream : Fetch File Operation
		} finally {
			try{
				if(is!=null)
					is.close();	//close InputStream
			} catch(Exception e){
				e.printStackTrace();
			}
			try{
				if(os!=null)
					os.close();	//close OutputStream
			} catch(Exception e){
				e.printStackTrace();
			}
		}
	}
}
	
	

WEBHDFS API

HTTP operation is used to fetch a file. Following sample is explained using curl command.

curl -i -L "http://<HOST>:<PORT>/<PATH>?user.name=<username>&op=OPEN [&offset=<LONG>][&length=<LONG>][&buffersize=<INT>]"
Sample Request:
curl -i -L "http://192.168.0.1:50070/webhdfs/v1/queryio/demo/file1.txt?user.name=admin&op=OPEN"
	

The request is redirected to a DataNode where the file data can be read. Client follows the redirect to the DataNode and receives the file data.

HTTP Request:
GET /webhdfs/v1/queryio/demo/file1.txt?user.name=admin&op=OPEN HTTP/1.1
User-Agent: curl/7.21.4 (universal-apple-darwin11.0) libcurl/7.21.4 OpenSSL/0.9.8r zlib/1.2.5
Host: 192.168.0.1:50070
Accept: */*

HTTP Response:
HTTP/1.1 307 TEMPORARY_REDIRECT
Expires: Thu, 01-Jan-1970 00:00:00 GMT
Set-Cookie: hadoop.auth="u=admin&p=admin&t=simple&e=1356640451437&s=rxWjJIkQGQCoY4syFgoWnD240YM=";Path=/
Location: http://server.local:50075/webhdfs/v1/queryio/demo/file1.txt?op=OPEN&user.name=admin&namenoderpcaddress=server.local:9000&offset=0
Content-Type: application/octet-stream
Content-Length: 0
Server: Jetty(6.1.26)

HTTP/1.1 200 OK
Content-Length: 16011
Content-Type: application/octet-stream
Server: Jetty(6.1.26)

Hello, QueryIO user!


Copyright
  • Contact Us
  • Contact Us
  • 2018 QueryIO Corporation. All Rights Reserved.

    QueryIO, "Big Data Intelligence" and the QueryIO Logo are trademarks of QueryIO Corporation. Apache, Hadoop and HDFS are trademarks of The Apache Software Foundation.