Configure MapReduce

System Configuration defines the computers, processes, and devices that compose the system and its boundary. More general the system configuration is the specific definition of the elements that define and/or prescribe what a system is composed of.

It consists of various configuration properties for MapReduce, ResourceManager and NodeManager.

To configure ResourceManager, NodeManager or Mapreduce properties, click on Configure MapReduce under ADMIN menu tab. Change the properties according to requirements and click Save to update properties.

Various properties that can be configured are:

TypeKeyDefault ValueDescription
Map Reduce yarn.resourcemanager.address 0.0.0.0:8040 The address of the applications manager interface in the RM.
Map Reduce yarn.resourcemanager.scheduler.address 0.0.0.0:8141 The address of the scheduler interface.
Map Reduce yarn.resourcemanager.webapp.address 0.0.0.0:8088 The address of the RM web application.
Map Reduce yarn.resourcemanager.resource-tracker.address 0.0.0.0:8025 The address of the RM Resource Tracker.
Map Reduce yarn.resourcemanager.admin.address 0.0.0.0:8141 The address of the RM admin interface.
Map Reduce mapreduce.job.hdfs-servers ${fs.default.name} HDFS Server URI.
Map Reduce mapreduce.framework.name yarn The runtime framework for executing MapReduce jobs. Can be one of local, classic or yarn.
Map Reduce mapreduce.map.memory.mb 1536 The amount of memory the MR AppMaster needs.
Map Reduce mapreduce.reduce.memory.mb 3072 Larger resource limit for reduces.
Map Reduce mapreduce.task.io.sort.mb 512 The total amount of buffer memory to use while sorting files, in megabytes. By default, gives each merge stream 1MB, which should minimize seeks.
Map Reduce mapreduce.task.io.sort.factor 100 The number of streams to merge at once while sorting files. This determines the number of open file handles.
Map Reduce mapreduce.reduce.shuffle.parallelcopies 50 The default number of parallel transfers run by reduce during the copy(shuffle) phase.
Map Reduce queryio.yarn.log-dir Where log files are stored. Used by queryio server for yarn runtime configuration.
Map Reduce queryio.yarn.pid-dir The directory where pid files are stored. Used by queryio server for yarn runtime configuration.
Map Reduce queryio.yarn.heap-size 4096 The maximum amount of heap to use, in MB. Used by queryio server for yarn runtime configuration.
Node Manager yarn.nodemanager.address 0.0.0.0:0 Address of node manager IPC.
Node Manager yarn.nodemanager.localizer.address 0.0.0.0:4344 Address where the localizer IPC is.
Node Manager yarn.nodemanager.container-manager.thread-count 5 Number of threads container manager uses.
Node Manager yarn.nodemanager.localizer.client.thread-count 5 Number of threads to handle localization requests.
Node Manager yarn.nodemanager.heartbeat.interval-ms 1000 Heartbeat interval to RM
Node Manager yarn.nodemanager.local-dirs /tmp/nm-local-dir List of directories to store localized files in.
Node Manager yarn.nodemanager.log-dirs /tmp/logs Where to store container logs.
Node Manager yarn.nodemanager.resource.memory-mb 8192 Amount of physical memory, in MB, that can be allocated for containers.
Node Manager yarn.nodemanager.webapp.address 0.0.0.0:9999 NM Webapp address.
Node Manager yarn.nodemanager.aux-services mapreduce.shuffle TShuffle service that needs to be set for Map Reduce applications.
Node Manager queryio.nodemanager.options -Dcom.sun.management.jmxremote $YARN_NODEMANAGER_OPTS -Dcom.sun.management.jmxremote.port=9010 -Dcom.sun.management.jmxremote.authenticate=false -Dcom.sun.management.jmxremote.ssl.need.client.auth=false -Dcom.sun.management.jmxremote.ssl=false Node Manager specific runtime options. Used by queryio server for yarn runtime configuration.
Resource Manager yarn.resourcemanager.client.thread-count 10 The number of threads used to handle applications manager requests.
Resource Manager yarn.resourcemanager.scheduler.client.thread-count 10 Number of threads to handle scheduler interface.
Resource Manager yarn.resourcemanager.admin.client.thread-count 1 Number of threads used to handle RM admin interface.
Resource Manager yarn.resourcemanager.resource-tracker.client.thread-count 10 Number of threads to handle resource tracker calls.
Resource Manager yarn.scheduler.minimum-allocation-mb 128 The minimum allocation size for every container request at the RM, in MBs. Memory requests lower than this won
Resource Manager yarn.scheduler.maximum-allocation-mb 10240 The maximum allocation size for every container request at the RM, in MBs. Memory requests higher than this won
Resource Manager mapreduce.jobhistory.address 0.0.0.0:10020 MapReduce JobHistory Server IPC host:port
Resource Manager mapreduce.jobhistory.webapp.address 0.0.0.0:19888 MapReduce JobHistory Server Web UI host:port
Resource Manager mapreduce.jobhistory.intermediate-done-dir /mr-history/tmp Directory where history files are written by MapReduce jobs.
Resource Manager mapreduce.jobhistory.done-dir /mr-history/done Directory where history files are managed by the MR JobHistory Server.
Resource Manager queryio.unit.num.splits 100 Number of splits for each mapper.
Resource Manager queryio.resourcemanager.options -Dcom.sun.management.jmxremote $YARN_RESOURCEMANAGER_OPTS -Dcom.sun.management.jmxremote.port=9008 -Dcom.sun.management.jmxremote.authenticate=false -Dcom.sun.management.jmxremote.ssl.need.client.auth=false -Dcom.sun.management.jmxremote.ssl=false Resource Manager specific runtime options. Used by queryio server for yarn runtime configuration.

NOTE: All descriptions are part of Apache Hadoop documentation.

Add Key

You can also add custom configuration properties related to any MapReduce cluster component.



Copyright © 2018 QueryIO Corporation. All Rights Reserved.

QueryIO, "Big Data Intelligence" and the QueryIO Logo are trademarks of QueryIO Corporation. Apache, Hadoop and HDFS are trademarks of The Apache Software Foundation.