On Ingest Data Tagging

In this chapter

This chapter explains on ingest data tagging through UI and API.

What is On Ingest Tagging

QueryIO supports on ingest data tagging. This means that when you upload any data to the cluster, the server automatically executes data analysis procedures [parsers] to extract information from the data that you are uploading.

You can also write and register your own data analysis procedures for different file types that you want to analyze. To see how you can write your own On Ingest Parser, refer to the developer documentation.

QueryIO ships with the parser that uses Apache Tika to extract metadata from various types of files. The file types supported are:

You can find the complete source code for this parser in $INSTALL_HOME/demo/ directory. You can also add enhancements to this parser to support other file types. The compiled classes for this parser are bundled in $INSTALL_HOME/demo/OnIngest.jar file. By default, this parser is not registered.

Registering Data Tagger

QueryIO provides inbuilt On Ingest parsers for following files types :

To manually register parsers for specific file types, follow the steps mentioned below.

You can use analytics query manager to query the metadata extracted using these parsers.

Tagging Using S3 Compatible REST API

You can tag data using Amazon S3 compatible REST api. Any header starting with prefix x-amz-meta- is considered as user metadata. x-amz-meta- is used along with PUT Object api to tag data. User-defined metadata is stored along with data when this header is used with PUT object request.

Sample Request with the x-amz-meta- Header

For further details about REST api, refer to the developer documentation.

Copyright 2018 QueryIO Corporation. All Rights Reserved.

QueryIO, "Big Data Intelligence" and the QueryIO Logo are trademarks of QueryIO Corporation. Apache, Hadoop and HDFS are trademarks of The Apache Software Foundation.