This topic describes how to manually configure a mapping between the Alibaba Cloud account of a DataWorks tenant member and the account of a specified identity in an E-MapReduce (EMR) cluster. This way, the tenant member can use the specified identity of the mapped cluster to run tasks in DataWorks.
Precautions
A mapping between a tenant member account and an EMR cluster account takes effect for all workspaces in which the EMR cluster is registered. Do not modify the EMR cluster unless it is required for your business.
If you do not configure a mapping between a tenant member account and an EMR cluster account by following the instructions in this topic, DataWorks issues tasks to the EMR cluster for running based on the following policies by default:
If you use a RAM user, the system account of the EMR cluster that has the same name as the RAM user is used to run the tasks. If Lightweight Directory Access Protocol (LDAP) or Kerberos authentication is not enabled for the EMR cluster, you must configure a mapping between the RAM user and the system account of the EMR cluster by performing the operations in this topic. Otherwise, the tasks fail to be run in DataWorks.
If you use an Alibaba Cloud account, you must manually configure a mapping between the Alibaba Cloud account and the EMR cluster account regardless of whether LDAP or Kerberos authentication is enabled for the EMR cluster. Otherwise, the tasks fail to be run in DataWorks.
NoteThe account that is used by a user to access an EMR cluster in DataWorks varies based on the access identity that you specified when you registered the EMR cluster.
RAM user: The task owner or a RAM user is used as the default access identity of an EMR cluster when you register the EMR cluster in DataWorks.
Alibaba Cloud account: An Alibaba Cloud account is used as the default access identity of an EMR cluster when you register the EMR cluster in DataWorks.
Usage Notes
Authentication method
DataWorks does not allow you to configure a mapping between a tenant member account and an EMR cluster account for which LDAP authentication and Kerberos authentication are enabled. If you configure this type of mapping, tasks fail to be run in DataWorks.
Whitelist configuration
If Ranger authentication is enabled for an EMR cluster, you must add DataWorks to the whitelist of the EMR cluster to ensure that DataWorks can access the EMR cluster. For information about how to add DataWorks to the whitelist of an EMR cluster, see the Appendix: Add DataWorks to the whitelist of an EMR cluster section in this topic.
User management
If you use an account that is not a system account of an EMR cluster for identity authentication, such as Kerberos authentication, you must enable the related authentication service for the EMR cluster and add the account that is used to develop EMR tasks in DataWorks to the authentication service. For more information, see Manage third-party authentication files.
Data permissions
You can manage permissions on services in an EMR cluster. This way, data operation permissions of DataWorks users are isolated. For example, you can use Ranger to manage the permissions of an EMR cluster account that maps to an Alibaba Cloud account.
If Data Lake Formation (DLF) is specified as the metadata storage service for an EMR cluster and the DLF-Auth component is used to enable the data permission management feature of DLF, you can request for data permissions in Security Center in the DataWorks console. For more information, see Manage permissions on DLF.
Mapping configuration
Take note that tasks fail to be run in DataWorks in the scenarios that are described in the following table.
Scenario
Description
A system account of an EMR cluster is used for mapping configuration in DataWorks
A RAM user is used to run tasks in DataWorks. However, no EMR cluster account has the same name as the RAM user.
A RAM user is used to run tasks in DataWorks and a mapping between the RAM user and an EMR cluster account is manually configured. However, the account or password of the mapped EMR cluster is different from the actual account or password of the EMR cluster.
An Alibaba Cloud account is used to run tasks in DataWorks. However, the Alibaba Cloud account is not mapped to an EMR cluster account.
The LDAP or Kerberos account mapping type is used in DataWorks
The LDAP or Kerberos authentication service is enabled for an EMR cluster, but a mapping is not configured or is incorrectly configured between a tenant member account and the EMR cluster account in DataWorks.
The Kerberos account mapping type is used in DataWorks. However, the Kerberos authentication service is not enabled for the EMR cluster account.
The LDAP account mapping type is used in DataWorks. However, the LDAP authentication service is not enabled for the related component in the EMR cluster.
NoteIf you configure an LDAP account mapping in DataWorks, SQL tasks in DataWorks, such as Hive, Impala, Presto, and Trino tasks, use the mapped EMR cluster account for authentication by default. However, if LDAP authentication is not enabled for the related component in the EMR cluster, the tasks fail.
Limits
You can use only the following accounts or roles to configure identity mappings for all users:
An Alibaba Cloud account
RAM users or RAM roles to which the AliyunDataWorksFullAccess and AliyunEMRFullAccess policies are attached
RAM users or RAM roles that are assigned the Workspace Administrator role and attached the AliyunEMRFullAccess policy
Member accounts that do not belong to the preceding types can be used to configure identity mappings only for themselves.
Go to the Account Mappings page
Go to the Management Center page.
Log on to the DataWorks console. In the top navigation bar, select the desired region. In the left-side navigation pane, choose . On the page that appears, select the desired workspace from the drop-down list and click Go to Management Center.
In the left-side navigation pane of the SettingCenter page, click Cluster Management. On the Cluster Management page, click Register Cluster. In the Select Cluster Type dialog box, click E-MapReduce. The Register EMR Cluster page appears.
Go to the Account Mappings page.
On the EMR cluster page, find the desired EMR cluster and click the Account Mappings tab.
Configure a mapping between a tenant member account and the EMR cluster account
Click EditAccount Mappings in the upper-right corner of the Account Mappings tab. On the page that appears, you can perform the following steps to configure an identity mapping for the EMR cluster:
Upload a configuration file.
If Kerberos authentication is enabled for the EMR cluster, you must upload a keytab file first to ensure that EMR Trino and EMR Presto tasks can be run as expected. For more information, see Download the authentication credentials of a user account.
Configure a mapping.
Configuration Mode: You can set the parameter to Customize Configurations for Cluster or Reference Configurations of Another Cluster.
Mapping Type: the mapping type for cluster authentication. Valid values: Mapping to System Account, Mapping to OpenLDAP Account, and Mapping to Kerberos Account.
NoteIf you select Mapping to Kerberos Account for the Mapping Type parameter, you must upload a keytab file.
Before you select Kerberos account mapping, make sure that the Kerberos authentication service is enabled for the EMR cluster. For more information, see Enable Kerberos authentication.
Before you select OPEN LDAP account mapping, make sure that the LDAP authentication service is enabled for the related component in the EMR cluster. If you configure an LDAP account mapping in DataWorks, SQL tasks in DataWorks, such as Hive, Impala, Presto, and Trino tasks, use the mapped EMR cluster account for authentication by default. However, if LDAP authentication is not enabled for the related component in the EMR cluster, the tasks fail.
Appendix: Add DataWorks to the whitelist of an EMR cluster
If Ranger is enabled for an EMR cluster, you must add DataWorks to the whitelist of the EMR cluster and restart the Hive service before you can develop EMR tasks in DataWorks. Otherwise, the following error is reported when the EMR tasks are run: Cannot modify spark.yarn.queue at runtime
or Cannot modify SKYNET_BIZDATE at runtime
.
Restart the service.
After the whitelist is configured, you must restart the Hive service for the configurations to take effect.