After you register a Cloudera's Distribution Including Apache Hadoop (CDH) cluster or Cloudera Data Platform (CDP) cluster in DataWorks, you can configure a mapping between an Alibaba Cloud account or a RAM user of a DataWorks tenant member and the account of a specific identity in the CDH or CDP cluster. This way, the tenant member can use the mapped identity of the cluster to access the cluster. The procedure of configuring a mapping to a specific identity in a CDP cluster is similar to that of configuring a mapping to a specific identity in a CDH cluster. This topic describes how to configure a mapping to a specific identity in a CDH cluster.
Mapping types
The account that is used to access a CDH cluster and execute the code of CDH tasks in DataWorks varies based on the default access identity that you specify when you register the CDH cluster. For more information, see the Configure the default access identity for the cluster section of the "Register a CDH or CDP cluster to DataWorks" topic. The following table describes the accounts that you can specify as default access identities and the supported mapping types.
Account type and description | Mapping type and description | ||
Cluster account | The cluster account that you specify as the default access identity is used to execute the code of CDH tasks regardless of who runs CDH tasks in DataWorks. For example, if you specify a cluster account as the default access identity, the cluster account is used to run CDH tasks regardless of whether the CDH tasks are submitted by an Alibaba Cloud account, a RAM user that is assigned the Workspace Administrator role, or a RAM user that is assigned only the Development role. | No Authentication | By default, the Mapping Type parameter is set to No Authentication. Important If you specify a mapping account as the default access identity, you cannot set the Mapping Type parameter to No Authentication. Otherwise, CDH tasks will fail because no access identity is configured for the Alibaba Cloud account or RAM user. You can set the Mapping Type parameter to System account mapping, OPEN LDAP account mapping, or Kerberos account mapping based on your business requirements. |
Mapping account | The CDH system account, Kerberos account, or OpenLDAP account that is mapped to an Alibaba Cloud account or a RAM user is used to execute the code of CDH tasks when the Alibaba Cloud account or RAM user is used by a workspace member to run CDH tasks in DataWorks. If you specify a mapping account as the default access identity, you must go to the Account Mapping tab to configure a mapping between a CDH cluster account and an Alibaba Cloud account or a RAM user. Important You may fail to use an Alibaba Cloud account or a RAM user to run CDH tasks in the following scenarios:
| System account mapping |
|
OPEN LDAP account mapping |
| ||
Kerberos account mapping |
|
Prerequisites
A CDH cluster account is created.
If you want to select Kerberos account mapping, make sure that Kerberos authentication is enabled for the CDH cluster.
Before you use an OpenLDAP account, make sure that the OpenLDAP service is enabled for the CDH cluster.
A CDH cluster is registered in DataWorks. For more information, see Register a CDH or CDP cluster to DataWorks.
Step 1: Go to the Account Mappings tab
Go to the Management Center page.
Log on to the DataWorks console. In the top navigation bar, select the desired region. In the left-side navigation pane, choose . On the page that appears, select the desired workspace from the drop-down list and click Go to Management Center.
In the left-side navigation pane of the SettingCenter page, click Cluster Management.
On the Cluster Management page, find the CDH cluster that you want to manage and click the
.On the page that appears, you can configure a mapping between an Alibaba Cloud account or a RAM user of a DataWorks tenant member and the CDH cluster account. After you configure the mapping, the tenant member can use the specified identity of the mapped cluster account to run CDH tasks.
Step 2: Configure a mapping between a tenant member account and the CDH cluster account
In this step, you can perform the following operations to specify the cluster account that is used to execute the code of CDH tasks in DataWorks:
Configure the Mapping Type parameter.
You can set the Mapping Type parameter to No Authentication, Mapping to System Account, Mapping to OpenLDAP Account, or Mapping to Kerberos Account based on your business requirements. For more information, see Mapping types.
Configure a mapping.
Configure a mapping based on the mapping type that you specify.
NoteIf you set the Mapping Type to No Authentication, you do not need to configure a mapping. By default, the cluster account that is configured when you register a CDH or CDP cluster is used to run CDH tasks. For more information, see the Step 2: Register a CDH or CDP cluster section in the "Register a CDH or CDP cluster to DataWorks" topic.
System account mapping
You can configure a mapping between an Alibaba Cloud account or a RAM user and a system account of a CDH cluster based on the on-screen instructions.
Use Alibaba Cloud accounts to run tasks: Select an Alibaba Cloud account and configure a mapping between the Alibaba Cloud account and a system account.
Use RAM users to run tasks: Select a RAM user and configure a mapping between the RAM user and a system account. The following types of mappings are supported:
Mapping between accounts with the same name: The CDH cluster account that has the same account name as the RAM user is mapped to the RAM user. Example:
RAM user: ram_user_1@xxx.onaliyun.com
CDH cluster account that has the same account name as the RAM user: ram_user_1
If you use a RAM user named ram_user_1@xxx.onaliyun.com to run CDH tasks, the tasks are actually run by the CDH cluster account ram_user_1.
NoteIf you use a RAM user to run CDH tasks in DataWorks, a CDH cluster account that has the same account name as the RAM user is actually used to run CDH tasks in the CDH cluster. You can also use a CDH cluster account that has a different account name from the RAM user to run CDH tasks.
To prevent tasks from failing to run, make sure that an account that has the same account name as the RAM user exists in the CDH cluster. If no such account exists, create an account on the
Mapping between accounts with different names: A CDH cluster account that has a different account name from the RAM user is mapped to the RAM user.
Kerberos account mapping
You can configure a mapping between an Alibaba Cloud account or a RAM user and a Kerberos account of the CDH cluster. A Kerberos account is specified in the Instance name@Domain name format. Example: cdn_test@HADOOP.COM.
During Kerberos authentication, the keytab and krb5.conf files are required.
The krb5.conf file is used to store the configurations of the Key Distribution Center (KDC) server.
The keytab file is used to store the authentication credentials of the resource principal. The file name must be in the Kerberos account.keytab format.
You must add the required account and upload the required files based on the on-screen instructions.
NoteIf Kerberos authentication is enabled for Hive MetaStore of the CDH cluster, set the Mapping Type parameter to Kerberos account mapping. Otherwise, metadata collection is affected.
If you use Presto and select the Kerberos account mapping type, configure the
Config.Properties
andPresto.Jks
files on the Basic Information tab of the CDH cluster.Make sure that Kerberos authentication is enabled for the CDH cluster.
OpenLDAP account mapping
You can configure a mapping between an Alibaba Cloud account or a RAM user and an OpenLDAP account based on the on-screen instructions.
NoteIf you use Presto and select the OPEN LDAP account mapping type, configure the
Config.Properties
andPresto.Jks
files on the Basic Information tab of the CDH cluster.Make sure that the OpenLDAP service is enabled for the cluster.
Click Complete. The mapping is configured. The tasks that are run by an Alibaba Cloud account or a RAM user are actually run by the cluster account to which the Alibaba Cloud account or the RAM user is mapped.