This topic describes how to configure MIT Kerberos authentication when you access HDFS.
Prerequisites
A Hadoop cluster of EMR V3.40, EMR V4.10.1, or a minor version that is earlier than EMR 3.40 or EMR 4.10.1 is created. For more information, see Create a cluster.
Access HDFS by running the hadoop command
The following example demonstrates how to access HDFS as the test user:
- Run the following command on the gateway cluster that is associated with your EMR cluster to configure the krb5.conf file:
scp root@emr-header-1:/etc/krb5.conf /etc/
- Set the hadoop.security.authentication.use.has parameter to false.
- Add a principal.
- Obtain a ticket. Run the following commands on the client where you want to run the hadoop command. In this example, the gateway cluster is used.
- Run the following command on the gateway cluster to import environment variables:
export HADOOP_CONF_DIR=/etc/has/hadoop-conf
- Run the following hadoop command:
hadoop fs -ls /
Information similar to the following output is returned:Found 6 items drwxr-xr-x - hadoop hadoop 0 2021-03-29 11:16 /apps drwxrwxrwx - flowagent hadoop 0 2021-03-29 11:18 /emr-flow drwxr-x--- - has hadoop 0 2021-03-29 11:16 /emr-sparksql-udf drwxrwxrwt - hadoop hadoop 0 2021-03-29 11:17 /spark-history drwxr-x--- - hadoop hadoop 0 2021-03-29 11:16 /tmp drwxrwxrwt - hadoop hadoop 0 2021-03-29 11:17 /user
Access HDFS by using Java code
- Use a local ticket cache Note You must run the kinit command to obtain a ticket in advance. If applications attempt to access an expired ticket, an error occurs.
public static void main(String[] args) throws IOException { Configuration conf = new Configuration(); // Load the configurations of HDFS. You can retrieve a copy of configurations from the EMR cluster. conf.addResource(new Path("/etc/ecm/hadoop-conf/hdfs-site.xml")); conf.addResource(new Path("/etc/ecm/hadoop-conf/core-site.xml")); // Run the kinit command to obtain a ticket in advance by using a Linux account. UserGroupInformation.setConfiguration(conf); UserGroupInformation.loginUserFromSubject(null); FileSystem fs = FileSystem.get(conf); FileStatus[] fsStatus = fs.listStatus(new Path("/")); for(int i = 0; i < fsStatus.length; i++){ System.out.println(fsStatus[i].getPath().toString()); } }
- (Recommended) Use the keytab file Note The keytab file is permanently valid. The validity of the keytab file is irrelevant to local tickets.
public static void main(String[] args) throws IOException { String keytab = args[0]; String principal = args[1]; Configuration conf = new Configuration(); // Load the configurations of HDFS. You can retrieve a copy of configurations from the EMR cluster. conf.addResource(new Path("/etc/ecm/hadoop-conf/hdfs-site.xml")); conf.addResource(new Path("/etc/ecm/hadoop-conf/core-site.xml")); // Use the keytab file. You can retrieve the keytab file from the emr-header-1 node of the EMR cluster by running a command. UserGroupInformation.setConfiguration(conf); UserGroupInformation.loginUserFromKeytab(principal, keytab); FileSystem fs = FileSystem.get(conf); FileStatus[] fsStatus = fs.listStatus(new Path("/")); for(int i = 0; i < fsStatus.length; i++){ System.out.println(fsStatus[i].getPath().toString()); } }
Dependencies in the pom.xml file:<dependencies> <dependency> <groupId>org.apache.hadoop</groupId> <artifactId>hadoop-common</artifactId> <version>x.x.x</version> </dependency> <dependency> <groupId>org.apache.hadoop</groupId> <artifactId>hadoop-hdfs</artifactId> <version>x.x.x</version> </dependency> </dependencies>
Notex.x.x
indicates the Hadoop version of the EMR cluster.