Configure HDFS

System Configuration defines the computers, processes, and devices that compose the system and its boundary. More general the system configuration is the specific definition of the elements that define and/or prescribe what a system is composed of.

It consists of various configuration properties for DataNodes, NameNodes, Checkpoint Node, High Availability and HDFS.

To configure DataNodes, NameNodes, High Availability or HDFS properties, click on Configure HDFS under ADMIN menu tab. Change the properties according to requirements and click Save to update properties.

Various properties that can be configured are:

Type	Key	Default Value	Description
Checkpoint Node	dfs.namenode.secondary.http-address	0.0.0.0:50090	The secondary namenode http server address and port. If the port is 0 then the server will start on a free port.
Checkpoint Node	dfs.namenode.checkpoint.dir	file://${hadoop.tmp.dir}/dfs/namesecondary	Determines where on the local filesystem the DFS secondary name node should store the temporary images to merge. If this is a comma-delimited list of directories then the image is replicated in all of the directories for redundancy.
Checkpoint Node	dfs.namenode.checkpoint.period	3600	The number of seconds between two periodic checkpoints.
Checkpoint Node	dfs.namenode.checkpoint.txns	40000	The Secondary NameNode or CheckpointNode will create a checkpoint of the namespace every
Checkpoint Node	dfs.namenode.checkpoint.check.period	60	The SecondaryNameNode and CheckpointNode will poll the NameNode every
Checkpoint Node	dfs.namenode.num.checkpoints.retained	2	The number of image checkpoint files that will be retained by the NameNode and Secondary NameNode in their storage directories. All edit logs necessary to recover an up-to-date namespace from the oldest retained checkpoint will also be retained.
Checkpoint Node	queryio.secondarynamenode.options	-Dcom.sun.management.jmxremote $HADOOP_SECONDARYNAMENODE_OPTS -Dcom.sun.management.jmxremote.port=9005 -Dcom.sun.management.jmxremote.authenticate=false -Dcom.sun.management.jmxremote.ssl.need.client.auth=false -Dcom.sun.management.jmxremote.ssl=false	Checkpoint Node specific run-time options. Used by queryio server for hdfs runtime configuration.
DataNode	dfs.datanode.du.reserved	0	Reserved space in bytes per volume. Always leave this much space free for non dfs use.
DataNode	dfs.datanode.handler.count	10	The number of server threads for the DataNode.
DataNode	dfs.datanode.address	0.0.0.0:50010	The address where the DataNode server will listen to. If the port is 0 then the server will start on a free port.
DataNode	dfs.datanode.http.address	0.0.0.0:50075	The DataNode http server address and port. If the port is 0 then the server will start on a free port.
DataNode	dfs.datanode.ipc.address	0.0.0.0:50020	The DataNode ipc server address and port. If the port is 0 then the server will start on a free port.
DataNode	dfs.datanode.https.address	0.0.0.0:50475	The DataNode secure http server address and port.
DataNode	dfs.datanode.max.transfer.threads	4096	Specifies the maximum number of threads to use for transferring data in and out of the DN.
DataNode	dfs.datanode.data.dir.perm	700	Permissions for the directories on on the local filesystem where the DFS data node store its blocks. The permissions can either be octal or symbolic.
DataNode	dfs.datanode.data.dir	file://${hadoop.tmp.dir}/dfs/data	Determines where on the local filesystem the DFS data node should store the data. If this is a comma-delimited list of directories then the name table is replicated in all of the directories, for redundancy.
DataNode	queryio.datanode.data.disk		Datanode directory.
DataNode	queryio.datanode.options	-Dcom.sun.management.jmxremote $HADOOP_DATANODE_OPTS -Dcom.sun.management.jmxremote.port=9006 -Dcom.sun.management.jmxremote.authenticate=false -Dcom.sun.management.jmxremote.ssl.need.client.auth=false -Dcom.sun.management.jmxremote.ssl=false	Datanode specific runtime options. Used by queryio server for hadoop runtime configuration.
HDFS	dfs.blocksize	67108864	The default block size for new HDFS files.
HDFS	dfs.nameservices		Comma-separated list of nameservices.
HDFS	dfs.ha.NameNodes		The prefix for a given nameservice, contains a comma-separated list of NameNodes for a given nameservice (eg: EXAMPLENAMESERVICE).
HDFS	dfs.replication.max	512	Maximal block replication.
HDFS	dfs.permissions.enabled	true	If "true", enable permission checking in HDFS. If "false", permission checking is turned off, but all other behavior is unchanged. Switching from one parameter value to the other does not change the mode, owner or group of files or directories.
HDFS	dfs.permissions.superusergroup	supergroup	The name of the group of super-users.
HDFS	dfs.namenode.upgrade.permission	777	The name of the group of super-users.
HDFS	hadoop.security.groups.cache.secs	30	User Group Information cache refresh interval in seconds.
HDFS	dfs.https.enable	true	Decide if HTTPS(SSL) is supported on HDFS.
HDFS	dfs.https.port	50470	The NameNode secure http port.
HDFS	dfs.https.server.keystore.resource	ssl-server.xml	Resource file from which ssl server keystore information will be extracted.
HDFS	dfs.client.https.keystore.resource	ssl-client.xml	Resource file from which ssl client keystore information will be extracted.
HDFS	dfs.client.https.need-auth	true	Whether SSL client certificate authentication is required.
HDFS	dfs.client.block.write.retries	3	The number of retries for writing blocks to the data nodes, before we signal failure to the application.
HDFS	io.file.buffer.size	16384	The size of buffer for use in sequence files. The size of this buffer should probably be a multiple of hardware page size (4096 on Intel x86), and it determines how much data is buffered during read and write operations.
HDFS	io.bytes.per.checksum	512	The number of bytes per checksum. Must not be larger than dfs.stream-buffer-size.
HDFS	fs.trash.interval	0	Number of minutes between trash checkpoints. To disable the trash feature, enter 0.
HDFS	fs.df.interval	600000	Disk usage statistics refresh interval in msec.
HDFS	hadoop.security.authorization	true	Is service-level authorization enabled?
HDFS	hadoop.security.group.mapping	com.queryio.plugin.groupinfo.QueryIOGroupInfoServiceProvider	Class for user to group mapping (get groups for a given user) for ACL
HDFS	queryio.controller.data.fetch.interval	15	Data fetch interval in seconds.
HDFS	queryio.hadoop.options	-server -Xms1024M -Xmn400M -XX:PermSize=128M -XX:MaxPermSize=128M -XX:+UnlockExperimentalVMOptions -XX:+UseConcMarkSweepGC -XX:CMSInitiatingOccupancyFraction=80 -XX:+UseStringCache -XX:+AggressiveOpts -XX:+EliminateLocks -XX:+UseBiasedLocking -XX:+ExplicitGCInvokesConcurrent -Djava.net.preferIPv4Stack=true -Djava.awt.headless=true -Dfile.encoding=UTF-8	Extra Java runtime options options. Used by queryio server for hadoop runtime configuration.
HDFS	queryio.hadoop.log-dir		Where log files are stored. Used by queryio server for hadoop runtime configuration.
HDFS	queryio.hadoop.pid-dir		The directory where pid files are stored. Used by queryio server for hadoop runtime configuration.
HDFS	queryio.hadoop.heap-size	4096	The maximum amount of heap to use, in MB. Used by queryio server for hadoop runtime configuration.
High Availability	dfs.replication	1	Default block replication. The actual number of replications can be specified when the file is created. The default is used if replication is not specified in create time.
High Availability	dfs.replication.min	1	Minimal block replication.
High Availability	dfs.heartbeat.interval	3	Determines DataNode heartbeat interval in seconds.
High Availability	dfs.NameNode.heartbeat.recheck-interval	300000	Determines DataNode heartbeat recheck interval in milliseconds.
High Availability	fs.checkpoint.period	3600	The number of seconds between two periodic checkpoints.
High Availability	dfs.DataNode.scan.period.hours	0	Interval in hours for DataNode to scan data directories and reconcile the difference between blocks in memory and on the disk. If set to 0, the interval defaults to 3 weeks
High Availability	dfs.blockreport.intervalMsec	21600000	Determines block reporting interval in milliseconds.
High Availability	queryio.agent.monitor.interval	10	Agent monitor interval in minutes.
High Availability	queryio.node.monitor.interval	60	Node monitor interval in seconds.
NameNode	fs.permissions.umask-mode	022	Default permission for file/folder.
NameNode	dfs.umaskmode	022	Default permission for file/folder.
NameNode	dfs.client.failover.proxy.provider.mycluster	org.apache.hadoop.hdfs.server.NameNode.ha.ConfiguredFailoverProxyProvider	The Java class that HDFS clients use to contact the Active NameNode.
NameNode	dfs.ha.fencing.methods	sshfence	A list of scripts or Java classes which will be used to fence the Active NameNode during a failover.
NameNode	dfs.ha.fencing.ssh.private-key-files	/root/.ssh/id_rsa	A comma-separated list of SSH private key files.
NameNode	dfs.NameNode.name.dir	file://${hadoop.tmp.dir}/dfs/name	Determines where on the local filesystem the DFS name node should store the name table(fsimage). If this is a comma-delimited list of directories then the name table is replicated in all of the directories, for redundancy.
NameNode	dfs.NameNode.shared.edits.dir		A directory on shared storage between the multiple NameNodes in an HA cluster. This directory will be written by the active and read by the standby in order to keep the namespaces synchronized. This directory does not need to be listed in dfs.NameNode.edits.dir. It should be left empty in a non-HA cluster.
NameNode	dfs.hosts		Names a file that contains a list of hosts that are permitted to connect to the NameNode. The full pathname of the file must be specified. If the value is empty, all hosts are permitted.
NameNode	dfs.hosts.exclude		Names a file that contains a list of hosts that are not permitted to connect to the NameNode. The full pathname of the file must be specified. If the value is empty, no hosts are excluded.
NameNode	dfs.namenode.http-address	0.0.0.0:50070	The address and the base port where the dfs namenode web ui will listen on. If the port is 0 then the server will start on a free port.
NameNode	dfs.namenode.https-address	0.0.0.0:50470	The namenode secure http server address and port.
NameNode	dfs.namenode.rpc-address	0.0.0.0:9000	The fully-qualified RPC address for each NameNode for a given nameservice to listen on
NameNode	dfs.NameNode.handler.count	100	The number of server threads for the NameNode.
NameNode	dfs.NameNode.safemode.threshold-pct	0.999f	Specifies the percentage of blocks that should satisfy the minimal replication requirement defined by dfs.NameNode.replication.min. Values less than or equal to 0 mean not to wait for any particular percentage of blocks before exiting safemode. Values greater than 1 will make safe mode permanent.
NameNode	dfs.NameNode.safemode.extension	30000	Determines extension of safe mode in milliseconds after the threshold level is reached.
NameNode	dfs.nameservice.id		The ID of this nameservice. If the nameservice ID is not configured or more than one nameservice is configured for dfs.federation.nameservices it is determined automatically by matching the local node.
NameNode	dfs.block.replicator.classname	org.apache.hadoop.hdfs.server.blockmanagement.BlockPlacementPolicyDefault	HDFS Block Placement policy. Default value : org.apache.hadoop.hdfs.server.blockmanagement.BlockPlacementPolicyDefault.
NameNode	net.topology.script.file.name		The script name that should be invoked to resolve DNS names to NetworkTopology names. Example: the script would take host.foo.bar as an argument, and return /rack1 as the output.
NameNode	net.topology.script.number.args	1	The max number of args that the script configured with net.topology.script.file.name should be run with. Each arg is an IP address.
NameNode	queryio.namenode.options	-Dcom.sun.management.jmxremote $HADOOP_NAMENODE_OPTS -Dcom.sun.management.jmxremote.port=9004 -Dcom.sun.management.jmxremote.authenticate=false -Dcom.sun.management.jmxremote.ssl.need.client.auth=false -Dcom.sun.management.jmxremote.ssl=false	NameNode specific runtime options. Used by queryio server for hadoop runtime configuration.
NameNode	queryio.namenode.data.disk		NameNode directory.
NameNode	queryio.customtag.db.dbsourceid		QueryIO custom tag poolname.
NameNode	queryio.os3server.port	5667	QueryIO Services port specific to node.
NameNode	queryio.hdfsoverftp.port	5669	QueryIO Services port specific to node.
NameNode	queryio.ftpserver.port	5660	QueryIO Services port specific to node.
NameNode	queryio.ftpserver.ssl.enabled	false	QueryIO Secure FTP enabled.
NameNode	queryio.ftpserver.ssl.port	5670	QueryIO Secure FTP port specific to node.
NameNode	queryio.ftpserver.ssl.keystore		SSL keystore for QueryIO Secure FTP specific to node.
NameNode	queryio.ftpserver.ssl.password	hadoop	SSL password for QueryIO Secure FTP specific to node.
NameNode	queryio.server.url	http://localhost:5678/queryio/	QueryIO Services port specific to node.
NameNode	fs.defaultFS	file:///	The name of the default file system. A URI whose scheme and authority determine the FileSystem implementation.
NameNode	queryio.dfs.data.encryption.key	vdphkLF2eWW4k0h542VX1gKZWaT2JrIY	Server side data encryption key

NOTE: Descriptions are part of Apache Hadoop documentation.

Add Key

You can also add custom configuration properties related to any HDFS cluster component.

QueryIO, "Big Data Intelligence" and the QueryIO Logo are trademarks of QueryIO Corporation. Apache, Hadoop and HDFS are trademarks of The Apache Software Foundation.

Configure HDFS

Add Key

Copyright © 2018 QueryIO Corporation. All Rights Reserved.

QueryIO, "Big Data Intelligence" and the QueryIO Logo are trademarks of QueryIO Corporation. Apache, Hadoop and HDFS are trademarks of The Apache Software Foundation.