Managing MapReduce jobs

In this chapter

MapReduce is a programming model for processing large data sets.

This document will show you how easily you can add and execute MapReduce jobs using QueryIO. It is assumed that you have already configured ResourceManager and NodeManager nodes along with NameNode and DataNode.

QueryIO ships with MapReduce jobs for parsing CSV and LOG file types. The job for CSV file types lets you apply filter expressions on the contents of the file data and inserts the results in the database. The job for LOG file types lets you search for particular messages or exceptions and inserts the results in the database. CSV and LOG parser jobs are bundled in $INSTALL_HOME/demo/CSVParserJob.jar and $INSTALL_HOME/demo/LOGParserMRJob.jar files respectively.

QueryIO exposes various interfaces to allow traditional programmers to write their own custom MapReduce jobs. To see how you can write your own MapReduce jobs, refer to the developer documentation.

This document will guide you through adding and executing MapReduce job for parsing LOG file types.

Adding MapReduce Job

Executing MapReduce Job

Checking Job Status

You can use Query Manager to query the information extracted using MapReduce jobs.



Copyright 2017 QueryIO Corporation. All Rights Reserved.

QueryIO, "Big Data Intelligence" and the QueryIO Logo are trademarks of QueryIO Corporation. Apache, Hadoop and HDFS are trademarks of The Apache Software Foundation.