This operation retrieves all or some of the files in a directory from QueryIO server.
GET Operation can be performed through following interfaces:
Following code is used to get list of all the files in a directory.
Apache Hadoop file system classes are used for this purpose. Configuration settings consist of URL and replication count for files. Hadoop file system object is used with these configuration settings.
FileSystem.listFiles(org.apache.Hadoop.fs.Path)
is used to get all the files from the directory provided in the path. A while
loop is used to traverse through all the files in the bucket.
import java.io.IOException; import org.apache.hadoop.conf.Configuration; import org.apache.hadoop.fs.FileSystem; import org.apache.hadoop.fs.LocatedFileStatus; import org.apache.hadoop.fs.Path; import org.apache.hadoop.fs.RemoteIterator; import org.apache.hadoop.hdfs.DFSConfigKeys; public class GetBucket { /* * This program lists all the files and directories in a folder non recursively. */ public static void main(String[] args) throws IOException{ Configuration conf = new Configuration(true); //Create a configuration object to define hdfs properties conf.set(DFSConfigKeys.FS_DEFAULT_NAME_KEY, "hdfs://192.168.0.1:9000"); // URL for your namenode conf.set(DFSConfigKeys.DFS_REPLICATION_KEY, "3"); // Replication count for files you write //Initialize DFS FileSystem object with QueryIO configurations FileSystem dfs = FileSystem.get(conf); //Returns the configured filesystem implementation. FileStatus[] statusList = dfs.listStatus(new Path("/queryio/demo")); //get list of files from directory "demo" for(int i=0; i<statusList.length; i++){ if(statusList[i].isFile()){ System.out.println(statusList[i].getPath().getName()); //display file name } } } }
To get files in a given directory, HTTP reqest can be used. Following sample is explained using curl
command.
Syntax of curl command is :
curl -i "http://<HOST>:<PORT>/<PATH>?user.name=<username>&op=LISTSTATUS"
-i option
: Include the HTTP-header in the output like server-name, date of the document, HTTP-version etc.<HOST>
: Hostname of the queryio server.<PORT>
: Port on which server is working.<PATH>
: A valid path to directory name.user.name=<username>
: QueryIO account username for authentication.op=LISTSTATUS"
: GET all files from directory.Sample Request: curl -i "http://192.168.0.1:50070/webhdfs/v1/queryio/demo?user.name=admin&op=LISTSTATUS"
Above request will get all files from a directory "demo".
GET directory operation using WEBHDFS api returns a JSON object.
HTTP Request: GET /webhdfs/v1/queryio/demo?user.name=admin&op=LISTSTATUS HTTP/1.1 User-Agent: curl/7.21.4 (universal-apple-darwin11.0) libcurl/7.21.4 OpenSSL/0.9.8r zlib/1.2.5 Host: 192.168.0.1:50070 Accept: */* HTTP Response: HTTP/1.1 200 OK Content-Type: application/json Content-Length: 1136 Server: Jetty(6.1.26) { "FileStatuses": { "FileStatus": [ { "accessTime": 1356603848022, "blockSize": 67108864, "group": "queryio", "length": 16011, "modificationTime": 1356603848045, "owner": "admin", "pathSuffix": "file1.txt", "permission": "644", "replication": 1, "type": "FILE" } ] } }