Hadoop Security Setup: Kerberos

In this Chapter

Following are the steps for Kerberos configuration:

Introduction

Kerberos is a computer network authentication protocol which works on the basis of "tickets" to allow nodes communicating over a non-secure network to prove their identity to one another in a secure manner.
A Kerberos principal is a unique identity to which Kerberos can assign tickets. For Hadoop, the principals should be of the format username/fully.qualified.domain.name@REALM-NAME.COM. The term username in the username/fully.qualified.domain.name@REALM-NAME.COM principal refers to the username of an existing Unix account, such as hdfs or mapred.

Click here to download Kerberos

NOTE: You must have Administrative privileges for kerberos setup.

Install Kerberos

Start by unpacking the Kerberos source distribution to some directory(krb5-1.10.tar).
For example, unpack kerberos to directory '/app/krb5-1.10'

To create the build, use the following procedure:.

Next step is to install the binaries. This can be done by executing the command:

To install binaries to destination directory, use following command:

Testing the Kerberos

The Kerberos distribution provides built-in regression tests. To test the build, use following command:

Installing Key Distribution Centers(KDCs)

Work of KDCs is to issue Kerberos ticket. Master KDC has the master copy of the database which is distributed to the slave KDCs at regular intervals, thus each KDC has a copy of the Kerberos database. Any changes in database are reported to master KDC and Slave KDCs provide Kerberos ticket-granting services.

Setup KDC

Editing the Configuration Files

Modify the configuration files, /etc/krb5.conf and /usr/local/var/krb5kdc/kdc.conf to reflect the correct information (such as the hostnames and realm name). Most of the tags in configuration file have default values, but there are some tags in the krb5.conf file whose values must be specified.

krb5.conf

The krb5.conf file has information about Kerberos configuration which includes admin servers and the KDCs locations for the Kerberos, interest realms, current realm defaults and applications of Kerberos, and hostnames mappings onto Kerberos realms. Default install directory of krb5.conf file is /etc. Environment variable 'KRB5_CONFIG' can be used to change it.

Replace the contents of krb5.conf file with following code:

kdc.conf

The kdc.conf file has information about KDC configuration, which includes defaults used to issue Kerberos tickets. By default install directory of kdc.conf file is /usr/local/var/krb5kdc. It can be changed by setting the environment variable 'KRB5_KDC_PROFILE'. You can also find kdc.conf file at location "/usr/local/share/examples/krb5/kdc.conf".

Replace the contents of kdc.conf file with following code:

Create Database

The Kerberos database and the optional stash file can be created using "kdb5_util" command on the Master KDC .
The stash file, which is a local copy of the master key lies on the KDC's local disk in encrypted form. To authenticate the KDC to itself automatically before starting the kadmind and krb5kdc daemons, the stash file is used. If you choose to install a stash file, its access permission should be restricted to root only. You can also ignore to install stash file. kdb5_util will prompt you for the master key for the Kerberos database. This key can be any string.

The following is an example of how to create a Kerberos database and stash file on the KDC, using the kdb5_util command.

This will create five files in the directory specified in your kdc.conf file:

(The default directory is /usr/local/var/krb5kdc.) If you do not want a stash file, run the above command without the -s option

Adding Administrators to the ACL File

Access Control List (acl) file needs to be created, and Kerberos principal of at least one of the administrators is put into it. ACL file gets used by the kadmind daemon to restrict which principals can view and make privileged modifications to the Kerberos database files. The filename should match the value that has been set for "acl file" in kdc.conf file. '/usr/local/var/krb5kdc/kadm5.acl' is the default file name.

Format of the ACL file is:

Example of a kadm5.acl file: Note that order is important; permissions are determined by the first matching entry.

Adding Administrators to the Kerberos Database

One needs to add administrator user to the kerberos database(atleast one). Use kadmin.local on the master KDC for this purpose. The administrative principal must be added to ACL list before it can be created.

For example:

Creating a kadmind Keytab

A keytab is a file containing pairs of Kerberos principals and an encrypted copy of that principal's key. The keytab files are unique to each host since their keys include the hostname. This file is used to authenticate a principal on a host to Kerberos without human interaction or storing a password in a plain text file. The kadmind keytab is the key which is used by legacy administration daemons kadmind4 and v5passwdd to decrypt administrator's or client's Kerberos tickets to determine whether or not they should have access to the database.

You need to create the kadmin keytab with entries for the principals kadmin/admin and kadmin/changepw. (These principals are automatically added to Kerberos database.) To create the kadmin keytab, run kadmin.local and use the ktadd command as follows:

Starting the Kerberos Daemons on KDC

To start the kerberos at the master KDC, use following commands:

Each daemon will fork and run in the background. If you want the daemon to start automatically at boot time, add them to the KDC's /etc/rc or /etc/inittab file.(stash file required)

Principals from QueryIO

Adding kerberos principals from QueryIO is done automatically. All QueryIO users are added as principals in kerberos.

To add a QueryIO user as principal, same user credentials as used by QueryIO are used to create principal in kerberos i.e same username and password are used for principal's username and password.

Shut Down the Cluster

Stop all nodes in cluster to enable security and then change configuration properties. All nodes must be stopped because node restarted with security enabled can not communicate with the node running without security enabled. This can be done through QueryIO UI. All NameNodes and DataNodes should be stopped manually. To stop a node, select the node and click Stop.

Enable Hadoop Security

core-site.xml

All configuration files throughout the cluster must have same content. To enable hadoop security, append the following properties to the core-site.xml file for all QueryIO components on every host.

You can find core-site.xml on every registered host machine : $HOST_INSTALL_PATH/QueryIOPackage/hadoop-2.0.2-alpha/etc/$NODE_TYPE$-conf_$NODE_ID$/

($NODE_TYPE$ can be NameNode, DataNode, ResourceManager, NodeManager and $NODE_ID$ is the respective id of every node.)

hdfs-site.xml

You can find hdfs-site.xml on every registered host machine : $HOST_INSTALL_PATH/QueryIOPackage/hadoop-2.0.2-alpha/etc/$NODE_TYPE$-conf_$NODE_ID$/

($NODE_TYPE$ can be NameNode, DataNode, ResourceManager, NodeManager and $NODE_ID$ is the respective id of every node.)

Append the following properties to the hdfs-site.xml :

These properties has to be manually updated in the respective files.

Configure Cluster Components

Append following options to queryio.<cluster_component>.options property for all cluster components.

-Djava.security.krb5.realm=queryiorealm -Dsun.security.krb5.debug=true -Djava.security.krb5.kdc=192.168.0.1

Select the component and click configure. For example, in case of datanode, select the datanode and click on configure. Now append the property "queryio.datanode.options" and click save. Repeat the process for all components.

Integrating Kerberos with QueryIO

Once the Kerberos has been successfully configured, QueryIO can be integrated with kerberos by changing the property useKerberos to true in queryio.properties file which is stored at "tomcat/webapps/queryio/conf".

Starting up the Cluster

Now you can start the Cluster through QueryIO UI. Start all the NamoNodes, DataNodes, ResourceManagers & NodeManagers. To start a NameNode, select NameNode and click Start. To start a DataNode, select DataNode and click Start and so on.

If all the nodes in the cluster starts well, then your QueryIO cluster has been successfully configured with kerberos.



Copyright © 2018 QueryIO Corporation. All Rights Reserved.

QueryIO, "Big Data Intelligence" and the QueryIO Logo are trademarks of QueryIO Corporation. Apache, Hadoop and HDFS are trademarks of The Apache Software Foundation.