System Configuration defines the computers, processes, and devices that compose the system and its boundary. More general the system configuration is the specific definition of the elements that define and/or prescribe what a system is composed of.
It consists of various configuration properties for DataNodes, NameNodes, Checkpoint Node, High Availability and HDFS.
To configure DataNodes, NameNodes, High Availability or HDFS properties, click on Configure HDFS under ADMIN menu tab. Change the properties according to requirements and click Save to update properties.
Various properties that can be configured are:
Type | Key | Default Value | Description |
---|---|---|---|
Checkpoint Node | dfs.namenode.secondary.http-address | 0.0.0.0:50090 | The secondary namenode http server address and port. If the port is 0 then the server will start on a free port. |
Checkpoint Node | dfs.namenode.checkpoint.dir | file://${hadoop.tmp.dir}/dfs/namesecondary | Determines where on the local filesystem the DFS secondary name node should store the temporary images to merge. If this is a comma-delimited list of directories then the image is replicated in all of the directories for redundancy. |
Checkpoint Node | dfs.namenode.checkpoint.period | 3600 | The number of seconds between two periodic checkpoints. |
Checkpoint Node | dfs.namenode.checkpoint.txns | 40000 | The Secondary NameNode or CheckpointNode will create a checkpoint of the namespace every |
Checkpoint Node | dfs.namenode.checkpoint.check.period | 60 | The SecondaryNameNode and CheckpointNode will poll the NameNode every |
Checkpoint Node | dfs.namenode.num.checkpoints.retained | 2 | The number of image checkpoint files that will be retained by the NameNode and Secondary NameNode in their storage directories. All edit logs necessary to recover an up-to-date namespace from the oldest retained checkpoint will also be retained. |
Checkpoint Node | queryio.secondarynamenode.options | -Dcom.sun.management.jmxremote $HADOOP_SECONDARYNAMENODE_OPTS -Dcom.sun.management.jmxremote.port=9005 -Dcom.sun.management.jmxremote.authenticate=false -Dcom.sun.management.jmxremote.ssl.need.client.auth=false -Dcom.sun.management.jmxremote.ssl=false | Checkpoint Node specific run-time options. Used by queryio server for hdfs runtime configuration. |
DataNode | dfs.datanode.du.reserved | 0 | Reserved space in bytes per volume. Always leave this much space free for non dfs use. |
DataNode | dfs.datanode.handler.count | 10 | The number of server threads for the DataNode. |
DataNode | dfs.datanode.address | 0.0.0.0:50010 | The address where the DataNode server will listen to. If the port is 0 then the server will start on a free port. |
DataNode | dfs.datanode.http.address | 0.0.0.0:50075 | The DataNode http server address and port. If the port is 0 then the server will start on a free port. |
DataNode | dfs.datanode.ipc.address | 0.0.0.0:50020 | The DataNode ipc server address and port. If the port is 0 then the server will start on a free port. |
DataNode | dfs.datanode.https.address | 0.0.0.0:50475 | The DataNode secure http server address and port. |
DataNode | dfs.datanode.max.transfer.threads | 4096 | Specifies the maximum number of threads to use for transferring data in and out of the DN. |
DataNode | dfs.datanode.data.dir.perm | 700 | Permissions for the directories on on the local filesystem where the DFS data node store its blocks. The permissions can either be octal or symbolic. |
DataNode | dfs.datanode.data.dir | file://${hadoop.tmp.dir}/dfs/data | Determines where on the local filesystem the DFS data node should store the data. If this is a comma-delimited list of directories then the name table is replicated in all of the directories, for redundancy. |
DataNode | queryio.datanode.data.disk | Datanode directory. | |
DataNode | queryio.datanode.options | -Dcom.sun.management.jmxremote $HADOOP_DATANODE_OPTS -Dcom.sun.management.jmxremote.port=9006 -Dcom.sun.management.jmxremote.authenticate=false -Dcom.sun.management.jmxremote.ssl.need.client.auth=false -Dcom.sun.management.jmxremote.ssl=false | Datanode specific runtime options. Used by queryio server for hadoop runtime configuration. |
HDFS | dfs.blocksize | 67108864 | The default block size for new HDFS files. |
HDFS | dfs.nameservices | Comma-separated list of nameservices. | |
HDFS | dfs.ha.NameNodes | The prefix for a given nameservice, contains a comma-separated list of NameNodes for a given nameservice (eg: EXAMPLENAMESERVICE). | |
HDFS | dfs.replication.max | 512 | Maximal block replication. |
HDFS | dfs.permissions.enabled | true | If "true", enable permission checking in HDFS. If "false", permission checking is turned off, but all other behavior is unchanged. Switching from one parameter value to the other does not change the mode, owner or group of files or directories. |
HDFS | dfs.permissions.superusergroup | supergroup | The name of the group of super-users. |
HDFS | dfs.namenode.upgrade.permission | 777 | The name of the group of super-users. |
HDFS | hadoop.security.groups.cache.secs | 30 | User Group Information cache refresh interval in seconds. |
HDFS | dfs.https.enable | true | Decide if HTTPS(SSL) is supported on HDFS. |
HDFS | dfs.https.port | 50470 | The NameNode secure http port. |
HDFS | dfs.https.server.keystore.resource | ssl-server.xml | Resource file from which ssl server keystore information will be extracted. |
HDFS | dfs.client.https.keystore.resource | ssl-client.xml | Resource file from which ssl client keystore information will be extracted. |
HDFS | dfs.client.https.need-auth | true | Whether SSL client certificate authentication is required. |
HDFS | dfs.client.block.write.retries | 3 | The number of retries for writing blocks to the data nodes, before we signal failure to the application. |
HDFS | io.file.buffer.size | 16384 | The size of buffer for use in sequence files. The size of this buffer should probably be a multiple of hardware page size (4096 on Intel x86), and it determines how much data is buffered during read and write operations. |
HDFS | io.bytes.per.checksum | 512 | The number of bytes per checksum. Must not be larger than dfs.stream-buffer-size. |
HDFS | fs.trash.interval | 0 | Number of minutes between trash checkpoints. To disable the trash feature, enter 0. |
HDFS | fs.df.interval | 600000 | Disk usage statistics refresh interval in msec. |
HDFS | hadoop.security.authorization | true | Is service-level authorization enabled? |
HDFS | hadoop.security.group.mapping | com.queryio.plugin.groupinfo.QueryIOGroupInfoServiceProvider | Class for user to group mapping (get groups for a given user) for ACL |
HDFS | queryio.controller.data.fetch.interval | 15 | Data fetch interval in seconds. |
HDFS | queryio.hadoop.options | -server -Xms1024M -Xmn400M -XX:PermSize=128M -XX:MaxPermSize=128M -XX:+UnlockExperimentalVMOptions -XX:+UseConcMarkSweepGC -XX:CMSInitiatingOccupancyFraction=80 -XX:+UseStringCache -XX:+AggressiveOpts -XX:+EliminateLocks -XX:+UseBiasedLocking -XX:+ExplicitGCInvokesConcurrent -Djava.net.preferIPv4Stack=true -Djava.awt.headless=true -Dfile.encoding=UTF-8 | Extra Java runtime options options. Used by queryio server for hadoop runtime configuration. |
HDFS | queryio.hadoop.log-dir | Where log files are stored. Used by queryio server for hadoop runtime configuration. | |
HDFS | queryio.hadoop.pid-dir | The directory where pid files are stored. Used by queryio server for hadoop runtime configuration. | |
HDFS | queryio.hadoop.heap-size | 4096 | The maximum amount of heap to use, in MB. Used by queryio server for hadoop runtime configuration. |
High Availability | dfs.replication | 1 | Default block replication. The actual number of replications can be specified when the file is created. The default is used if replication is not specified in create time. |
High Availability | dfs.replication.min | 1 | Minimal block replication. |
High Availability | dfs.heartbeat.interval | 3 | Determines DataNode heartbeat interval in seconds. |
High Availability | dfs.NameNode.heartbeat.recheck-interval | 300000 | Determines DataNode heartbeat recheck interval in milliseconds. |
High Availability | fs.checkpoint.period | 3600 | The number of seconds between two periodic checkpoints. |
High Availability | dfs.DataNode.scan.period.hours | 0 | Interval in hours for DataNode to scan data directories and reconcile the difference between blocks in memory and on the disk. If set to 0, the interval defaults to 3 weeks |
High Availability | dfs.blockreport.intervalMsec | 21600000 | Determines block reporting interval in milliseconds. |
High Availability | queryio.agent.monitor.interval | 10 | Agent monitor interval in minutes. |
High Availability | queryio.node.monitor.interval | 60 | Node monitor interval in seconds. |
NameNode | fs.permissions.umask-mode | 022 | Default permission for file/folder. |
NameNode | dfs.umaskmode | 022 | Default permission for file/folder. |
NameNode | dfs.client.failover.proxy.provider.mycluster | org.apache.hadoop.hdfs.server.NameNode.ha.ConfiguredFailoverProxyProvider | The Java class that HDFS clients use to contact the Active NameNode. |
NameNode | dfs.ha.fencing.methods | sshfence | A list of scripts or Java classes which will be used to fence the Active NameNode during a failover. |
NameNode | dfs.ha.fencing.ssh.private-key-files | /root/.ssh/id_rsa | A comma-separated list of SSH private key files. |
NameNode | dfs.NameNode.name.dir | file://${hadoop.tmp.dir}/dfs/name | Determines where on the local filesystem the DFS name node should store the name table(fsimage). If this is a comma-delimited list of directories then the name table is replicated in all of the directories, for redundancy. |
NameNode | dfs.NameNode.shared.edits.dir | A directory on shared storage between the multiple NameNodes in an HA cluster. This directory will be written by the active and read by the standby in order to keep the namespaces synchronized. This directory does not need to be listed in dfs.NameNode.edits.dir. It should be left empty in a non-HA cluster. | |
NameNode | dfs.hosts | Names a file that contains a list of hosts that are permitted to connect to the NameNode. The full pathname of the file must be specified. If the value is empty, all hosts are permitted. | |
NameNode | dfs.hosts.exclude | Names a file that contains a list of hosts that are not permitted to connect to the NameNode. The full pathname of the file must be specified. If the value is empty, no hosts are excluded. | |
NameNode | dfs.namenode.http-address | 0.0.0.0:50070 | The address and the base port where the dfs namenode web ui will listen on. If the port is 0 then the server will start on a free port. |
NameNode | dfs.namenode.https-address | 0.0.0.0:50470 | The namenode secure http server address and port. |
NameNode | dfs.namenode.rpc-address | 0.0.0.0:9000 | The fully-qualified RPC address for each NameNode for a given nameservice to listen on |
NameNode | dfs.NameNode.handler.count | 100 | The number of server threads for the NameNode. |
NameNode | dfs.NameNode.safemode.threshold-pct | 0.999f | Specifies the percentage of blocks that should satisfy the minimal replication requirement defined by dfs.NameNode.replication.min. Values less than or equal to 0 mean not to wait for any particular percentage of blocks before exiting safemode. Values greater than 1 will make safe mode permanent. |
NameNode | dfs.NameNode.safemode.extension | 30000 | Determines extension of safe mode in milliseconds after the threshold level is reached. |
NameNode | dfs.nameservice.id | The ID of this nameservice. If the nameservice ID is not configured or more than one nameservice is configured for dfs.federation.nameservices it is determined automatically by matching the local node. | |
NameNode | dfs.block.replicator.classname | org.apache.hadoop.hdfs.server.blockmanagement.BlockPlacementPolicyDefault | HDFS Block Placement policy. Default value : org.apache.hadoop.hdfs.server.blockmanagement.BlockPlacementPolicyDefault. | NameNode | net.topology.script.file.name | The script name that should be invoked to resolve DNS names to NetworkTopology names. Example: the script would take host.foo.bar as an argument, and return /rack1 as the output. |
NameNode | net.topology.script.number.args | 1 | The max number of args that the script configured with net.topology.script.file.name should be run with. Each arg is an IP address. |
NameNode | queryio.namenode.options | -Dcom.sun.management.jmxremote $HADOOP_NAMENODE_OPTS -Dcom.sun.management.jmxremote.port=9004 -Dcom.sun.management.jmxremote.authenticate=false -Dcom.sun.management.jmxremote.ssl.need.client.auth=false -Dcom.sun.management.jmxremote.ssl=false | NameNode specific runtime options. Used by queryio server for hadoop runtime configuration. |
NameNode | queryio.namenode.data.disk | NameNode directory. | |
NameNode | queryio.customtag.db.dbsourceid | QueryIO custom tag poolname. | |
NameNode | queryio.os3server.port | 5667 | QueryIO Services port specific to node. |
NameNode | queryio.hdfsoverftp.port | 5669 | QueryIO Services port specific to node. |
NameNode | queryio.ftpserver.port | 5660 | QueryIO Services port specific to node. |
NameNode | queryio.ftpserver.ssl.enabled | false | QueryIO Secure FTP enabled. |
NameNode | queryio.ftpserver.ssl.port | 5670 | QueryIO Secure FTP port specific to node. |
NameNode | queryio.ftpserver.ssl.keystore | SSL keystore for QueryIO Secure FTP specific to node. | |
NameNode | queryio.ftpserver.ssl.password | hadoop | SSL password for QueryIO Secure FTP specific to node. |
NameNode | queryio.server.url | http://localhost:5678/queryio/ | QueryIO Services port specific to node. |
NameNode | fs.defaultFS | file:/// | The name of the default file system. A URI whose scheme and authority determine the FileSystem implementation. |
NameNode | queryio.dfs.data.encryption.key | vdphkLF2eWW4k0h542VX1gKZWaT2JrIY | Server side data encryption key |
NOTE: Descriptions are part of Apache Hadoop documentation.
You can also add custom configuration properties related to any HDFS cluster component.