This blog is written by M Fakhri Darmawan, Solution Architect from Alibaba Cloud Indonesia
Alibaba Cloud Elastic MapReduce (EMR) is a big data processing solution that runs on the Alibaba Cloud platform. EMR is built on Alibaba Cloud ECS instances and is based on open-source Apache Hadoop and Apache Spark. EMR allows you to use the Hadoop and Spark ecosystem components, such as Apache Hive, Apache Kafka, Flink, Druid, and TensorFlow, to analyze and process data. You can use EMR to process data stored on different Alibaba Cloud data storage service, such as Object Storage Service (OSS), Log Service (SLS), and Relational Database Service (RDS). HDFS is the base layer used in Alibaba Cloud EMR if you use the Hadoop Platform based.
HDFS security is crucial because it used for data stored, data processing and hence. One of the important point in data security is authorization. To simplify managed authorization Alibaba Cloud EMR can integrated with Active Directory using Apache Knox (Knox). This is a guide to configure integration between Alibaba Cloud EMR with Active Directory.
Prerequisite: Active Directory and User Service
This is Active Directory used in this configuration, I create Hadoop OU (Organizational Unit) with user service: hdfsadmin and common user: hdfsuser
Step 1: Open EMR Console
Login to your Alibaba Cloud console and search for the E-MapReduce.
Step 2 : Open EMR Cluster
Open EMR Cluster detail. Select your cluster deployment region then select cluster management
In the cluster management page, select your cluster
Step 3 : Configure Knox
In your cluster page select cluster service
Scroll down services list and select knox
On the Knox service page, move to configure tab menu
Select cluster topo to configure the topology file
Scroll down the list to find the xml-direct-to-file-content configuration
Change the configuration file, delete shiroprovider in the existing configuration
<?xml version="1.0" encoding="utf-8"?>
<topology>
<gateway>
<!-- Delete from this line
<provider>
<role>authentication</role>
<name>ShiroProvider</name>
<enabled>true</enabled>
<param>
<name>sessionTimeout</name>
<value>30</value>
</param>
<param>
<name>main.ldapRealm</name>
<value>org.apache.hadoop.gateway.shirorealm.KnoxLdapRealm</value>
</param>
<param>
<name>main.ldapContextFactory</name>
<value>org.apache.hadoop.gateway.shirorealm.KnoxLdapContextFactory</value>
</param>
<param>
<name>main.ldapRealm.contextFactory</name>
<value>$ldapContextFactory</value>
</param>
<param>
<name>main.ldapRealm.userDnTemplate</name>
<value>uid={0},ou=people,o=emr</value>
</param>
<param>
<name>main.ldapRealm.contextFactory.url</name>
<value>ldap://{{hostname_ldap}}:10389</value>
</param>
<param>
<name>main.ldapRealm.contextFactory.authenticationMechanism</name>
<value>simple</value>
</param>
<param>
<name>urls./**</name>
<value>authcBasic</value>
</param>
</provider>
until this line -->
<provider>
<role>identity-assertion</role>
<name>Default</name>
<enabled>true</enabled>
</provider>
<provider>
<role>hostmap</role>
<name>static</name>
<enabled>true</enabled>
<param>
<name>knox.{{clusterId_region}}.emr.aliyuncs.com</name>
<value>{{hostname_master_main}}</value>
</param>
</provider>
<provider>
<role>ha</role>
<name>HaProvider</name>
<enabled>{{ha_enable}}</enabled>
<param>
<name>HDFSUI</name>
<value>maxFailoverAttempts=3;failoverSleep=1000;enabled=true</value>
</param>
<param>
<name>WEBHDFS</name>
<value>maxFailoverAttempts=3;failoverSleep=1000;enabled=true</value>
</param>
<param>
<name>YARNUI</name>
<value>maxFailoverAttempts=3;failoverSleep=1000;enabled=true</value>
</param>
<param>
<name>GANGLIAUI</name>
<value>maxFailoverAttempts=3;failoverSleep=1000;enabled=true</value>
</param>
<param>
<name>SPARKHISTORYUI</name>
<value>maxFailoverAttempts=3;failoverSleep=1000;enabled=true</value>
</param>
<param>
<name>JOBHISTORYUI</name>
<value>maxFailoverAttempts=3;failoverSleep=1000;enabled=true</value>
</param>
<param>
<name>NODEUI</name>
<value>maxFailoverAttempts=3;failoverSleep=1000;enabled=true</value>
</param>
<param>
<name>HBASEUI</name>
<value>maxFailoverAttempts=3;failoverSleep=1000;enabled=true</value>
</param>
<param>
<name>IMPALA-CATALOGD-UI</name>
<value>maxFailoverAttempts=3;failoverSleep=1000;enabled=true</value>
</param>
<param>
<name>IMPALA-STATESTORED-UI</name>
<value>maxFailoverAttempts=3;failoverSleep=1000;enabled=true</value>
</param>
<param>
<name>KUDUUI</name>
<value>maxFailoverAttempts=3;failoverSleep=1000;enabled=true</value>
</param>
</provider>
</gateway>
<service>
<role>NODEUI</role>
<url>http://emr-header-1:8042</url>
<url>http://emr-header-2:8042</url>
</service>
<service>
<role>STORMUI</role>
<url>http://emr-header-1:9999</url>
<url>http://emr-header-2:9999</url>
</service>
<service>
<role>OOZIEUI</role>
<url>http://emr-header-1:11000/oozie</url>
<url>http://emr-header-2:11000/oozie</url>
</service>
</topology>
replace with this new shiro provider tag, you need to configure the specified parameter with your value:
-main.ldapRealm.contextFactory.url
-main.ldapRealm.contextFactory.systemUsername
-main.ldapRealm.contextFactory.systemPassword
-main.ldapRealm.searchBase
?xml version="1.0" encoding="utf-8"?>
<topology>
<gateway>
<provider>
<role>authentication</role>
<name>ShiroProvider</name>
<enabled>true</enabled>
<param>
<name>sessionTimeout</name>
<value>30</value>
</param>
<param>
<name>main.ldapRealm</name>
<value>org.apache.hadoop.gateway.shirorealm.KnoxLdapRealm</value>
</param>
<param>
<name>main.ldapContextFactory</name>
<value>org.apache.hadoop.gateway.shirorealm.KnoxLdapContextFactory</value>
</param>
<!-- main.ldapRealm.contextFactory needs to be placed before main.ldapRealm.contextFactory* entries -->
<param>
<name>main.ldapRealm.contextFactory</name>
<value>$ldapContextFactory</value>
</param>
<!-- AD url -->
<param>
<name>main.ldapRealm.contextFactory.url</name>
<!-- change this ip address with your AD hostname or IP address -->
<value>ldap://127.0.0.1:389</value>
</param>
<!-- system user -->
<param>
<name>main.ldapRealm.contextFactory.systemUsername</name>
<!-- change this CN with your system user for integration-->
<value>CN=systemuser,OU=hadoop,DC=ad,DC=ondemand</value>
</param>
<!-- pass in the password using the alias created earlier -->
<param>
<name>main.ldapRealm.contextFactory.systemPassword</name>
<!-- change this value with your system user password -->
<value>userpassword</value>
</param>
<param>
<name>main.ldapRealm.contextFactory.authenticationMechanism</name>
<value>simple</value>
</param>
<param>
<name>urls./**</name>
<value>authcBasic</value>
</param>
<!-- AD groups of users to allow -->
<param>
<name>main.ldapRealm.searchBase</name>
<!-- change this value with your OU CN -->
<value>OU=hadoop,DC=ad,DC=ondemand</value>
</param>
<param>
<name>main.ldapRealm.userObjectClass</name>
<value>person</value>
</param>
<param>
<name>main.ldapRealm.userSearchAttributeName</name>
<value>sAMAccountName</value>
</param>
</provider>
<provider>
<role>identity-assertion</role>
<name>Default</name>
<enabled>true</enabled>
</provider>
<provider>
<role>hostmap</role>
<name>static</name>
<enabled>true</enabled>
<param>
<name>knox.{{clusterId_region}}.emr.aliyuncs.com</name>
<value>{{hostname_master_main}}</value>
</param>
</provider>
<provider>
<role>ha</role>
<name>HaProvider</name>
<enabled>{{ha_enable}}</enabled>
<param>
<name>HDFSUI</name>
<value>maxFailoverAttempts=3;failoverSleep=1000;enabled=true</value>
</param>
<param>
<name>WEBHDFS</name>
<value>maxFailoverAttempts=3;failoverSleep=1000;enabled=true</value>
</param>
<param>
<name>YARNUI</name>
<value>maxFailoverAttempts=3;failoverSleep=1000;enabled=true</value>
</param>
<param>
<name>GANGLIAUI</name>
<value>maxFailoverAttempts=3;failoverSleep=1000;enabled=true</value>
</param>
<param>
<name>SPARKHISTORYUI</name>
<value>maxFailoverAttempts=3;failoverSleep=1000;enabled=true</value>
</param>
<param>
<name>JOBHISTORYUI</name>
<value>maxFailoverAttempts=3;failoverSleep=1000;enabled=true</value>
</param>
<param>
<name>NODEUI</name>
<value>maxFailoverAttempts=3;failoverSleep=1000;enabled=true</value>
</param>
<param>
<name>HBASEUI</name>
<value>maxFailoverAttempts=3;failoverSleep=1000;enabled=true</value>
</param>
<param>
<name>IMPALA-CATALOGD-UI</name>
<value>maxFailoverAttempts=3;failoverSleep=1000;enabled=true</value>
</param>
<param>
<name>IMPALA-STATESTORED-UI</name>
<value>maxFailoverAttempts=3;failoverSleep=1000;enabled=true</value>
</param>
<param>
<name>KUDUUI</name>
<value>maxFailoverAttempts=3;failoverSleep=1000;enabled=true</value>
</param>
</provider>
</gateway>
<service>
<role>NODEUI</role>
<url>http://emr-header-1:8042</url>
<url>http://emr-header-2:8042</url>
</service>
<service>
<role>STORMUI</role>
<url>http://emr-header-1:9999</url>
<url>http://emr-header-2:9999</url>
</service>
<service>
<role>OOZIEUI</role>
<url>http://emr-header-1:11000/oozie</url>
<url>http://emr-header-2:11000/oozie</url>
</service>
</topology>
Step 4 : Logon Test
Open your Yarn UI and try logon using your user in Active Directory
You are successfully logged in
100 posts | 17 followers
FollowAlibaba Clouder - December 26, 2017
Alibaba Clouder - July 20, 2020
Alibaba Cloud Native - December 28, 2023
Alibaba Clouder - July 9, 2018
Alibaba Clouder - May 22, 2019
Alibaba Clouder - August 26, 2019
100 posts | 17 followers
FollowElastic and secure virtual cloud servers to cater all your cloud hosting needs.
Learn MoreAn encrypted and secure cloud storage service which stores, processes and accesses massive amounts of data from anywhere in the world
Learn MoreA Big Data service that uses Apache Hadoop and Spark to process and analyze data
Learn MoreA convenient and secure cloud-based Desktop-as-a-Service (DaaS) solution
Learn MoreMore Posts by Alibaba Cloud Indonesia