This chapter explains on ingest data tagging through UI and API.
QueryIO supports on ingest data tagging. This means that when you upload any data to the cluster, the server automatically executes data analysis procedures [parsers] to extract information from the data that you are uploading.
You can also write and register your own data analysis procedures for different file types that you want to analyze. To see how you can write your own On Ingest Parser, refer to the developer documentation.
QueryIO ships with the parser that uses Apache Tika to extract metadata from various types of files. The file types supported are:
You can find the complete source code for this parser in $INSTALL_HOME/demo/ directory. You can also add enhancements to this parser to support other file types. The compiled classes for this parser are bundled in $INSTALL_HOME/demo/OnIngest.jar file. By default, this parser is not registered.
QueryIO provides inbuilt On Ingest parsers for following files types :
To manually register parsers for specific file types, follow the steps mentioned below.
You can use analytics query manager to query the metadata extracted using these parsers.
You can tag data using Amazon S3 compatible REST api. Any header starting with prefix x-amz-meta- is considered as user metadata.
x-amz-meta- is used along with PUT Object api to tag data.
User-defined metadata is stored along with data when this header is used with PUT object request.
PUT DIR1/File1.pdf HTTP/1.1 Host: QueryIO.com Authorization: iffo6l9hel2hfmbj2384joljgh9mqga58gb9if9593ucli9ke5s2e3854shhcmmm x-amz-meta-author: QueryIO x-amz-meta-priority: high
For further details about REST api, refer to the developer documentation.