Fetch a file

Following interfaces provide functionality to perform get file operation:

DFS Client API

Get file operation through DFS Client API uses java.io and Hadoop classes. IOUtils.copy(InputStream, OutputStream) is used to get the file from server to local system.

import java.io.File;
import java.io.FileInputStream;
import java.io.FileOutputStream;
import java.io.IOException;
import java.io.InputStream;
import java.io.OutputStream;

import org.apache.commons.io.IOUtils;
import org.apache.hadoop.conf.Configuration;
import org.apache.hadoop.fs.FileSystem;
import org.apache.hadoop.fs.Path;
import org.apache.hadoop.hdfs.DFSConfigKeys;

public class GetObject {
	/*
	 * This program reads a file from HDFS and saves it on the local file system.
	 */
	public static void main(String[] args) throws IOException{
		Configuration conf = new Configuration(true);	//Create a configuration object to define hdfs properties
		conf.set(DFSConfigKeys.FS_DEFAULT_NAME_KEY, "hdfs://192.168.0.1:9000"); // URL for your namenode
		conf.set(DFSConfigKeys.DFS_REPLICATION_KEY, "3"); // Replication count for files you write
		
		OutputStream os = null;
		InputStream is = null;
		try{
			//Initialize DFS FileSystem object with QueryIO configurations 
			FileSystem dfs = FileSystem.get(conf);	
			dfs.mkdirs(new Path("/queryio/demo/"));	//creates new directory if it doesn't exist
			
			is = dfs.open(new Path("/queryio/demo/file1.txt"));	//InputStream to the object to be GET
			
			os = new FileOutputStream(new File("/local/queryio.txt"));	//OutputStream to a local filesystem file
			
			IOUtils.copy(is, os);	//copy bytes from InputStream to OutputStream : Fetch File Operation
		} finally {
			try{
				if(is!=null)
					is.close();	//close InputStream
			} catch(Exception e){
				e.printStackTrace();
			}
			try{
				if(os!=null)
					os.close();	//close OutputStream
			} catch(Exception e){
				e.printStackTrace();
			}
		}
	}
}
	
	

WEBHDFS API

HTTP operation is used to fetch a file. Following sample is explained using curl command.

curl -i -L "http://<HOST>:<PORT>/<PATH>?user.name=<username>&op=OPEN [&offset=<LONG>][&length=<LONG>][&buffersize=<INT>]"
Sample Request:
curl -i -L "http://192.168.0.1:50070/webhdfs/v1/queryio/demo/file1.txt?user.name=admin&op=OPEN"
	

The request is redirected to a DataNode where the file data can be read. Client follows the redirect to the DataNode and receives the file data.

HTTP Request:
GET /webhdfs/v1/queryio/demo/file1.txt?user.name=admin&op=OPEN HTTP/1.1
User-Agent: curl/7.21.4 (universal-apple-darwin11.0) libcurl/7.21.4 OpenSSL/0.9.8r zlib/1.2.5
Host: 192.168.0.1:50070
Accept: */*

HTTP Response:
HTTP/1.1 307 TEMPORARY_REDIRECT
Expires: Thu, 01-Jan-1970 00:00:00 GMT
Set-Cookie: hadoop.auth="u=admin&p=admin&t=simple&e=1356640451437&s=rxWjJIkQGQCoY4syFgoWnD240YM=";Path=/
Location: http://server.local:50075/webhdfs/v1/queryio/demo/file1.txt?op=OPEN&user.name=admin&namenoderpcaddress=server.local:9000&offset=0
Content-Type: application/octet-stream
Content-Length: 0
Server: Jetty(6.1.26)

HTTP/1.1 200 OK
Content-Length: 16011
Content-Type: application/octet-stream
Server: Jetty(6.1.26)

Hello, QueryIO user!


Copyright 2017 QueryIO Corporation. All Rights Reserved.

QueryIO, "Big Data Intelligence" and the QueryIO Logo are trademarks of QueryIO Corporation. Apache, Hadoop and HDFS are trademarks of The Apache Software Foundation.