Configure HDFS

System Configuration defines the computers, processes, and devices that compose the system and its boundary. More general the system configuration is the specific definition of the elements that define and/or prescribe what a system is composed of.

It consists of various configuration properties for DataNodes, NameNodes, Checkpoint Node, High Availability and HDFS.

To configure DataNodes, NameNodes, High Availability or HDFS properties, click on Configure HDFS under ADMIN menu tab. Change the properties according to requirements and click Save to update properties.

Various properties that can be configured are:

TypeKeyDefault ValueDescription
Checkpoint Node dfs.namenode.secondary.http-address 0.0.0.0:50090 The secondary namenode http server address and port. If the port is 0 then the server will start on a free port.
Checkpoint Node dfs.namenode.checkpoint.dir file://${hadoop.tmp.dir}/dfs/namesecondary Determines where on the local filesystem the DFS secondary name node should store the temporary images to merge. If this is a comma-delimited list of directories then the image is replicated in all of the directories for redundancy.
Checkpoint Node dfs.namenode.checkpoint.period 3600 The number of seconds between two periodic checkpoints.
Checkpoint Node dfs.namenode.checkpoint.txns 40000 The Secondary NameNode or CheckpointNode will create a checkpoint of the namespace every
Checkpoint Node dfs.namenode.checkpoint.check.period 60 The SecondaryNameNode and CheckpointNode will poll the NameNode every
Checkpoint Node dfs.namenode.num.checkpoints.retained 2 The number of image checkpoint files that will be retained by the NameNode and Secondary NameNode in their storage directories. All edit logs necessary to recover an up-to-date namespace from the oldest retained checkpoint will also be retained.
Checkpoint Node queryio.secondarynamenode.options -Dcom.sun.management.jmxremote $HADOOP_SECONDARYNAMENODE_OPTS -Dcom.sun.management.jmxremote.port=9005 -Dcom.sun.management.jmxremote.authenticate=false -Dcom.sun.management.jmxremote.ssl.need.client.auth=false -Dcom.sun.management.jmxremote.ssl=false Checkpoint Node specific run-time options. Used by queryio server for hdfs runtime configuration.
DataNode dfs.datanode.du.reserved 0 Reserved space in bytes per volume. Always leave this much space free for non dfs use.
DataNode dfs.datanode.handler.count 10 The number of server threads for the DataNode.
DataNode dfs.datanode.address 0.0.0.0:50010 The address where the DataNode server will listen to. If the port is 0 then the server will start on a free port.
DataNode dfs.datanode.http.address 0.0.0.0:50075 The DataNode http server address and port. If the port is 0 then the server will start on a free port.
DataNode dfs.datanode.ipc.address 0.0.0.0:50020 The DataNode ipc server address and port. If the port is 0 then the server will start on a free port.
DataNode dfs.datanode.https.address 0.0.0.0:50475 The DataNode secure http server address and port.
DataNode dfs.datanode.max.transfer.threads 4096 Specifies the maximum number of threads to use for transferring data in and out of the DN.
DataNode dfs.datanode.data.dir.perm 700 Permissions for the directories on on the local filesystem where the DFS data node store its blocks. The permissions can either be octal or symbolic.
DataNode dfs.datanode.data.dir file://${hadoop.tmp.dir}/dfs/data Determines where on the local filesystem the DFS data node should store the data. If this is a comma-delimited list of directories then the name table is replicated in all of the directories, for redundancy.
DataNode queryio.datanode.data.disk Datanode directory.
DataNode queryio.datanode.options -Dcom.sun.management.jmxremote $HADOOP_DATANODE_OPTS -Dcom.sun.management.jmxremote.port=9006 -Dcom.sun.management.jmxremote.authenticate=false -Dcom.sun.management.jmxremote.ssl.need.client.auth=false -Dcom.sun.management.jmxremote.ssl=false Datanode specific runtime options. Used by queryio server for hadoop runtime configuration.
HDFS dfs.blocksize 67108864 The default block size for new HDFS files.
HDFS dfs.nameservices Comma-separated list of nameservices.
HDFS dfs.ha.NameNodes The prefix for a given nameservice, contains a comma-separated list of NameNodes for a given nameservice (eg: EXAMPLENAMESERVICE).
HDFS dfs.replication.max 512 Maximal block replication.
HDFS dfs.permissions.enabled true If "true", enable permission checking in HDFS. If "false", permission checking is turned off, but all other behavior is unchanged. Switching from one parameter value to the other does not change the mode, owner or group of files or directories.
HDFS dfs.permissions.superusergroup supergroup The name of the group of super-users.
HDFS dfs.namenode.upgrade.permission 777 The name of the group of super-users.
HDFS hadoop.security.groups.cache.secs 30 User Group Information cache refresh interval in seconds.
HDFS dfs.https.enable true Decide if HTTPS(SSL) is supported on HDFS.
HDFS dfs.https.port 50470 The NameNode secure http port.
HDFS dfs.https.server.keystore.resource ssl-server.xml Resource file from which ssl server keystore information will be extracted.
HDFS dfs.client.https.keystore.resource ssl-client.xml Resource file from which ssl client keystore information will be extracted.
HDFS dfs.client.https.need-auth true Whether SSL client certificate authentication is required.
HDFS dfs.client.block.write.retries 3 The number of retries for writing blocks to the data nodes, before we signal failure to the application.
HDFS io.file.buffer.size 16384 The size of buffer for use in sequence files. The size of this buffer should probably be a multiple of hardware page size (4096 on Intel x86), and it determines how much data is buffered during read and write operations.
HDFS io.bytes.per.checksum 512 The number of bytes per checksum. Must not be larger than dfs.stream-buffer-size.
HDFS fs.trash.interval 0 Number of minutes between trash checkpoints. To disable the trash feature, enter 0.
HDFS fs.df.interval 600000 Disk usage statistics refresh interval in msec.
HDFS hadoop.security.authorization true Is service-level authorization enabled?
HDFS hadoop.security.group.mapping com.queryio.plugin.groupinfo.QueryIOGroupInfoServiceProvider Class for user to group mapping (get groups for a given user) for ACL
HDFS queryio.controller.data.fetch.interval 15 Data fetch interval in seconds.
HDFS queryio.hadoop.options -server -Xms1024M -Xmn400M -XX:PermSize=128M -XX:MaxPermSize=128M -XX:+UnlockExperimentalVMOptions -XX:+UseConcMarkSweepGC -XX:CMSInitiatingOccupancyFraction=80 -XX:+UseStringCache -XX:+AggressiveOpts -XX:+EliminateLocks -XX:+UseBiasedLocking -XX:+ExplicitGCInvokesConcurrent -Djava.net.preferIPv4Stack=true -Djava.awt.headless=true -Dfile.encoding=UTF-8 Extra Java runtime options options. Used by queryio server for hadoop runtime configuration.
HDFS queryio.hadoop.log-dir Where log files are stored. Used by queryio server for hadoop runtime configuration.
HDFS queryio.hadoop.pid-dir The directory where pid files are stored. Used by queryio server for hadoop runtime configuration.
HDFS queryio.hadoop.heap-size 4096 The maximum amount of heap to use, in MB. Used by queryio server for hadoop runtime configuration.
High Availability dfs.replication 1 Default block replication. The actual number of replications can be specified when the file is created. The default is used if replication is not specified in create time.
High Availability dfs.replication.min 1 Minimal block replication.
High Availability dfs.heartbeat.interval 3 Determines DataNode heartbeat interval in seconds.
High Availability dfs.NameNode.heartbeat.recheck-interval 300000 Determines DataNode heartbeat recheck interval in milliseconds.
High Availability fs.checkpoint.period 3600 The number of seconds between two periodic checkpoints.
High Availability dfs.DataNode.scan.period.hours 0 Interval in hours for DataNode to scan data directories and reconcile the difference between blocks in memory and on the disk. If set to 0, the interval defaults to 3 weeks
High Availability dfs.blockreport.intervalMsec 21600000 Determines block reporting interval in milliseconds.
High Availability queryio.agent.monitor.interval 10 Agent monitor interval in minutes.
High Availability queryio.node.monitor.interval 60 Node monitor interval in seconds.
NameNode fs.permissions.umask-mode 022 Default permission for file/folder.
NameNode dfs.umaskmode 022 Default permission for file/folder.
NameNode dfs.client.failover.proxy.provider.mycluster org.apache.hadoop.hdfs.server.NameNode.ha.ConfiguredFailoverProxyProvider The Java class that HDFS clients use to contact the Active NameNode.
NameNode dfs.ha.fencing.methods sshfence A list of scripts or Java classes which will be used to fence the Active NameNode during a failover.
NameNode dfs.ha.fencing.ssh.private-key-files /root/.ssh/id_rsa A comma-separated list of SSH private key files.
NameNode dfs.NameNode.name.dir file://${hadoop.tmp.dir}/dfs/name Determines where on the local filesystem the DFS name node should store the name table(fsimage). If this is a comma-delimited list of directories then the name table is replicated in all of the directories, for redundancy.
NameNode dfs.NameNode.shared.edits.dir A directory on shared storage between the multiple NameNodes in an HA cluster. This directory will be written by the active and read by the standby in order to keep the namespaces synchronized. This directory does not need to be listed in dfs.NameNode.edits.dir. It should be left empty in a non-HA cluster.
NameNode dfs.hosts Names a file that contains a list of hosts that are permitted to connect to the NameNode. The full pathname of the file must be specified. If the value is empty, all hosts are permitted.
NameNode dfs.hosts.exclude Names a file that contains a list of hosts that are not permitted to connect to the NameNode. The full pathname of the file must be specified. If the value is empty, no hosts are excluded.
NameNode dfs.namenode.http-address 0.0.0.0:50070 The address and the base port where the dfs namenode web ui will listen on. If the port is 0 then the server will start on a free port.
NameNode dfs.namenode.https-address 0.0.0.0:50470 The namenode secure http server address and port.
NameNode dfs.namenode.rpc-address 0.0.0.0:9000 The fully-qualified RPC address for each NameNode for a given nameservice to listen on
NameNode dfs.NameNode.handler.count 100 The number of server threads for the NameNode.
NameNode dfs.NameNode.safemode.threshold-pct 0.999f Specifies the percentage of blocks that should satisfy the minimal replication requirement defined by dfs.NameNode.replication.min. Values less than or equal to 0 mean not to wait for any particular percentage of blocks before exiting safemode. Values greater than 1 will make safe mode permanent.
NameNode dfs.NameNode.safemode.extension 30000 Determines extension of safe mode in milliseconds after the threshold level is reached.
NameNode dfs.nameservice.id The ID of this nameservice. If the nameservice ID is not configured or more than one nameservice is configured for dfs.federation.nameservices it is determined automatically by matching the local node.
NameNode dfs.block.replicator.classname org.apache.hadoop.hdfs.server.blockmanagement.BlockPlacementPolicyDefault HDFS Block Placement policy. Default value : org.apache.hadoop.hdfs.server.blockmanagement.BlockPlacementPolicyDefault.
NameNode net.topology.script.file.name The script name that should be invoked to resolve DNS names to NetworkTopology names. Example: the script would take host.foo.bar as an argument, and return /rack1 as the output.
NameNode net.topology.script.number.args 1 The max number of args that the script configured with net.topology.script.file.name should be run with. Each arg is an IP address.
NameNode queryio.namenode.options -Dcom.sun.management.jmxremote $HADOOP_NAMENODE_OPTS -Dcom.sun.management.jmxremote.port=9004 -Dcom.sun.management.jmxremote.authenticate=false -Dcom.sun.management.jmxremote.ssl.need.client.auth=false -Dcom.sun.management.jmxremote.ssl=false NameNode specific runtime options. Used by queryio server for hadoop runtime configuration.
NameNode queryio.namenode.data.disk NameNode directory.
NameNode queryio.customtag.db.dbsourceid QueryIO custom tag poolname.
NameNode queryio.os3server.port 5667 QueryIO Services port specific to node.
NameNode queryio.hdfsoverftp.port 5669 QueryIO Services port specific to node.
NameNode queryio.ftpserver.port 5660 QueryIO Services port specific to node.
NameNode queryio.ftpserver.ssl.enabled false QueryIO Secure FTP enabled.
NameNode queryio.ftpserver.ssl.port 5670 QueryIO Secure FTP port specific to node.
NameNode queryio.ftpserver.ssl.keystore SSL keystore for QueryIO Secure FTP specific to node.
NameNode queryio.ftpserver.ssl.password hadoop SSL password for QueryIO Secure FTP specific to node.
NameNode queryio.server.url http://localhost:5678/queryio/ QueryIO Services port specific to node.
NameNode fs.defaultFS file:/// The name of the default file system. A URI whose scheme and authority determine the FileSystem implementation.
NameNode queryio.dfs.data.encryption.key vdphkLF2eWW4k0h542VX1gKZWaT2JrIY Server side data encryption key

NOTE: Descriptions are part of Apache Hadoop documentation.

Add Key

You can also add custom configuration properties related to any HDFS cluster component.



Copyright © 2018 QueryIO Corporation. All Rights Reserved.

QueryIO, "Big Data Intelligence" and the QueryIO Logo are trademarks of QueryIO Corporation. Apache, Hadoop and HDFS are trademarks of The Apache Software Foundation.