Get all files in a Bucket

This operation retrieves all or some of the files in a directory from QueryIO server.

DFS Client API

Apache Hadoop file system classes are used for this purpose. Configuration settings consist of URL and replication count for files. Hadoop file system object is used with these configuration settings. FileSystem.listFiles(org.apache.Hadoop.fs.Path) is used to get all the files from the directory provided in the path. A while loop is used to traverse through all the files in the bucket.

import java.io.IOException;

import org.apache.hadoop.conf.Configuration;
import org.apache.hadoop.fs.FileSystem;
import org.apache.hadoop.fs.LocatedFileStatus;
import org.apache.hadoop.fs.Path;
import org.apache.hadoop.fs.RemoteIterator;
import org.apache.hadoop.hdfs.DFSConfigKeys;

public class GetBucket {
	/*
	 * This program lists all the files and directories in a folder non recursively.
	 */
	public static void main(String[] args) throws IOException{
		Configuration conf = new Configuration(true);	//Create a configuration object to define hdfs properties
		conf.set(DFSConfigKeys.FS_DEFAULT_NAME_KEY, "hdfs://192.168.0.1:9000"); // URL for your namenode
		conf.set(DFSConfigKeys.DFS_REPLICATION_KEY, "3"); // Replication count for files you write
		
		//Initialize DFS FileSystem object with QueryIO configurations 
		FileSystem dfs = FileSystem.get(conf);	//Returns the configured filesystem implementation.
		FileStatus[] statusList = dfs.listStatus(new Path("/queryio/demo"));	//get list of files from directory "demo"
		for(int i=0; i<statusList.length; i++){
			if(statusList[i].isFile()){
				System.out.println(statusList[i].getPath().getName());	//display file name
			}
		}
	}
}

WEBHDFS API

To get files in a given directory, HTTP reqest can be used. Following sample is explained using curl command.

HTTP Request:
GET /webhdfs/v1/queryio/demo?user.name=admin&op=LISTSTATUS HTTP/1.1
User-Agent: curl/7.21.4 (universal-apple-darwin11.0) libcurl/7.21.4 OpenSSL/0.9.8r zlib/1.2.5
Host: 192.168.0.1:50070
Accept: */*

HTTP Response:
HTTP/1.1 200 OK
Content-Type: application/json
Content-Length: 1136
Server: Jetty(6.1.26)
{
    "FileStatuses": {
        "FileStatus": [
            {
                "accessTime": 1356603848022, 
                "blockSize": 67108864, 
                "group": "queryio", 
                "length": 16011, 
                "modificationTime": 1356603848045, 
                "owner": "admin", 
                "pathSuffix": "file1.txt", 
                "permission": "644", 
                "replication": 1, 
                "type": "FILE"
            }
        ]
    }
}

Get all files in a directory

DFS Client API

WEBHDFS API

Copyright ©
Contact Us

Contact Us
2018 QueryIO Corporation. All Rights Reserved.

QueryIO, "Big Data Intelligence" and the QueryIO Logo are trademarks of QueryIO Corporation. Apache, Hadoop and HDFS are trademarks of The Apache Software Foundation.

Get all files in a directory

DFS Client API

WEBHDFS API

Copyright © Contact Us Contact Us 2018 QueryIO Corporation. All Rights Reserved.

QueryIO, "Big Data Intelligence" and the QueryIO Logo are trademarks of QueryIO Corporation. Apache, Hadoop and HDFS are trademarks of The Apache Software Foundation.

Copyright ©
Contact Us

Contact Us
2018 QueryIO Corporation. All Rights Reserved.