Get all files in a directory

This operation retrieves all or some of the files in a directory from QueryIO server.

GET Operation can be performed through following interfaces:

DFS Client API

Following code is used to get list of all the files in a directory.

Apache Hadoop file system classes are used for this purpose. Configuration settings consist of URL and replication count for files. Hadoop file system object is used with these configuration settings. FileSystem.listFiles(org.apache.Hadoop.fs.Path) is used to get all the files from the directory provided in the path. A while loop is used to traverse through all the files in the bucket.

import java.io.IOException;

import org.apache.hadoop.conf.Configuration;
import org.apache.hadoop.fs.FileSystem;
import org.apache.hadoop.fs.LocatedFileStatus;
import org.apache.hadoop.fs.Path;
import org.apache.hadoop.fs.RemoteIterator;
import org.apache.hadoop.hdfs.DFSConfigKeys;

public class GetBucket {
	/*
	 * This program lists all the files and directories in a folder non recursively.
	 */
	public static void main(String[] args) throws IOException{
		Configuration conf = new Configuration(true);	//Create a configuration object to define hdfs properties
		conf.set(DFSConfigKeys.FS_DEFAULT_NAME_KEY, "hdfs://192.168.0.1:9000"); // URL for your namenode
		conf.set(DFSConfigKeys.DFS_REPLICATION_KEY, "3"); // Replication count for files you write
		
		//Initialize DFS FileSystem object with QueryIO configurations 
		FileSystem dfs = FileSystem.get(conf);	//Returns the configured filesystem implementation.
		FileStatus[] statusList = dfs.listStatus(new Path("/queryio/demo"));	//get list of files from directory "demo"
		for(int i=0; i<statusList.length; i++){
			if(statusList[i].isFile()){
				System.out.println(statusList[i].getPath().getName());	//display file name
			}
		}
	}
}

	

WEBHDFS API

To get files in a given directory, HTTP reqest can be used. Following sample is explained using curl command.

Syntax of curl command is :

curl -i "http://<HOST>:<PORT>/<PATH>?user.name=<username>&op=LISTSTATUS"
Sample Request:
curl -i "http://192.168.0.1:50070/webhdfs/v1/queryio/demo?user.name=admin&op=LISTSTATUS"
	

Above request will get all files from a directory "demo".

GET directory operation using WEBHDFS api returns a JSON object.

HTTP Request:
GET /webhdfs/v1/queryio/demo?user.name=admin&op=LISTSTATUS HTTP/1.1
User-Agent: curl/7.21.4 (universal-apple-darwin11.0) libcurl/7.21.4 OpenSSL/0.9.8r zlib/1.2.5
Host: 192.168.0.1:50070
Accept: */*

HTTP Response:
HTTP/1.1 200 OK
Content-Type: application/json
Content-Length: 1136
Server: Jetty(6.1.26)
{
    "FileStatuses": {
        "FileStatus": [
            {
                "accessTime": 1356603848022, 
                "blockSize": 67108864, 
                "group": "queryio", 
                "length": 16011, 
                "modificationTime": 1356603848045, 
                "owner": "admin", 
                "pathSuffix": "file1.txt", 
                "permission": "644", 
                "replication": 1, 
                "type": "FILE"
            }
        ]
    }
}
	


Copyright 2017 QueryIO Corporation. All Rights Reserved.

QueryIO, "Big Data Intelligence" and the QueryIO Logo are trademarks of QueryIO Corporation. Apache, Hadoop and HDFS are trademarks of The Apache Software Foundation.