Following interfaces provide functionality to perform get file operation:
Get file operation through DFS Client API uses java.io
and Hadoop
classes. IOUtils.copy(InputStream, OutputStream) is used to get the file from server to local system.
import java.io.File; import java.io.FileInputStream; import java.io.FileOutputStream; import java.io.IOException; import java.io.InputStream; import java.io.OutputStream; import org.apache.commons.io.IOUtils; import org.apache.hadoop.conf.Configuration; import org.apache.hadoop.fs.FileSystem; import org.apache.hadoop.fs.Path; import org.apache.hadoop.hdfs.DFSConfigKeys; public class GetObject { /* * This program reads a file from HDFS and saves it on the local file system. */ public static void main(String[] args) throws IOException{ Configuration conf = new Configuration(true); //Create a configuration object to define hdfs properties conf.set(DFSConfigKeys.FS_DEFAULT_NAME_KEY, "hdfs://192.168.0.1:9000"); // URL for your namenode conf.set(DFSConfigKeys.DFS_REPLICATION_KEY, "3"); // Replication count for files you write OutputStream os = null; InputStream is = null; try{ //Initialize DFS FileSystem object with QueryIO configurations FileSystem dfs = FileSystem.get(conf); dfs.mkdirs(new Path("/queryio/demo/")); //creates new directory if it doesn't exist is = dfs.open(new Path("/queryio/demo/file1.txt")); //InputStream to the object to be GET os = new FileOutputStream(new File("/local/queryio.txt")); //OutputStream to a local filesystem file IOUtils.copy(is, os); //copy bytes from InputStream to OutputStream : Fetch File Operation } finally { try{ if(is!=null) is.close(); //close InputStream } catch(Exception e){ e.printStackTrace(); } try{ if(os!=null) os.close(); //close OutputStream } catch(Exception e){ e.printStackTrace(); } } } }
HTTP
operation is used to fetch a file. Following sample is explained using curl
command.
curl -i -L "http://<HOST>:<PORT>/<PATH>?user.name=<username>&op=OPEN
[&offset=<LONG>][&length=<LONG>][&buffersize=<INT>]"
-i option
: Include the HTTP-header in the output like server-name, date of the document, HTTP-version etc. -L option
: signifies Location<HOST>
: Hostname for the server.<PORT>
: Port on which the server is working.<PATH>
: A valid path of the file.<DATANODE>:<PORT>
: Datanode host and port.user.name=<username>
: QueryIO account username for authentication.op=OPEN
: Opens the file for reading data.[&offset=<LONG>]
: (Optional) The starting byte position. Default value is 0.[&length=<LONG>]
: (Optional) The number of bytes to be processed.[&buffersize=<INT>]
: (Optional) The size of the buffer used in transferring data.Sample Request: curl -i -L "http://192.168.0.1:50070/webhdfs/v1/queryio/demo/file1.txt?user.name=admin&op=OPEN"
The request is redirected to a DataNode where the file data can be read. Client follows the redirect to the DataNode and receives the file data.
HTTP Request: GET /webhdfs/v1/queryio/demo/file1.txt?user.name=admin&op=OPEN HTTP/1.1 User-Agent: curl/7.21.4 (universal-apple-darwin11.0) libcurl/7.21.4 OpenSSL/0.9.8r zlib/1.2.5 Host: 192.168.0.1:50070 Accept: */* HTTP Response: HTTP/1.1 307 TEMPORARY_REDIRECT Expires: Thu, 01-Jan-1970 00:00:00 GMT Set-Cookie: hadoop.auth="u=admin&p=admin&t=simple&e=1356640451437&s=rxWjJIkQGQCoY4syFgoWnD240YM=";Path=/ Location: http://server.local:50075/webhdfs/v1/queryio/demo/file1.txt?op=OPEN&user.name=admin&namenoderpcaddress=server.local:9000&offset=0 Content-Type: application/octet-stream Content-Length: 0 Server: Jetty(6.1.26) HTTP/1.1 200 OK Content-Length: 16011 Content-Type: application/octet-stream Server: Jetty(6.1.26) Hello, QueryIO user!