A typical MapReduce job comprises of following modules:
The input reader divides the input into independent chunks which are processed by map functions. Each input split (chunk) is assigned to exactly one Map function. The input reader generates key/value pairs from the data.
Input key/value pairs to a set of intermediate key/value pairs is mapped by Mapper function. Maps are the individual tasks which form intermediate records from input records. Transformed intermediate records do not need to be of the same type as the input records. A given input pair can map to zero or many output pairs.
Application's Reduce function is called by the MapReduce framework once for each unique key in the sorted order. The Reduce function iterates through the values that are associated with that key and produces zero or more outputs. The number of reducers for a job can be configured via JobConf.setNumReduceTasks(int). For applications that do not require reduction, you can set the number of reducers to zero. In this case, the outputs from the mapper are directly fed to the output writer.
The output writer writes the output of the reducer to a storage unit, usually HDFS or a database.
For information on how to write MapReduce jobs, click here.