Before you run E-MapReduce (EMR) tasks in DataWorks, you must complete authentication and authorization configurations at the EMR and DataWorks sides to ensure that the tasks can be run as expected. This topic describes how to manage permissions on DataWorks and EMR.
Background information
In DataWorks, you can configure mappings between the members in a workspace and the accounts of the EMR cluster associated with the workspace to obtain the permissions on the EMR cluster. This way, Alibaba Cloud accounts, task owners, or RAM users have different permissions on data when they run EMR tasks in DataWorks, and data permissions are isolated. For more information about permission configurations that are required to run EMR tasks in DataWorks, see the Permission management at the EMR side and Permission management at the DataWorks side sections in this topic.
Limits
DataWorks allows you to use only the system account or OpenLDAP account to configure mappings between members in a workspace and the accounts of the EMR cluster that is registered to DataWorks as a compute engine instance. When you configure the mappings, take note of the following items:
You can configure mappings only at the cluster level. Only one authentication method can be used.
The EMR cluster accounts and passwords in the mappings must be the same as the actual accounts and passwords of the EMR cluster registered to DataWorks.
If the EMR cluster accounts and passwords in the mappings are inconsistent with the actual accounts and passwords, or authentication is not enabled for the cluster, EMR tasks fail to be run in DataWorks. The following table describes the details.
Value of the Mapping Type parameter | Description |
Mapping to System Account | If the accounts or passwords are inconsistent, EMR tasks fail to be run in DataWorks. |
Mapping to OpenLDAP Account | In the following scenarios, EMR tasks fail to be run in DataWorks:
|
The authentication method for EMR clusters varies based on the compute engine. You can check whether an EMR cluster supports LDAP authentication in the EMR console.
Permission management at the EMR side
Enable LDAP authentication
If you want to use a non-system account for identity authentication in an EMR cluster, you must enable LDAP authentication for the cluster and add the account that is used to develop EMR tasks in DataWorks to LDAP users. In this case, you must perform the following steps:
Enable LDAP authentication for the cluster.
To use LDAP for identity authentication, you must enable LDAP authentication for the cluster. For more information, see Enable LDAP authentication.
Prepare the account that is used to run EMR tasks and add the account to LDAP users and the related DataWorks workspace.
We recommend that you add users who need to create, test, commit, and deploy EMR tasks in DataStudio to LDAP users and the related DataWorks workspace. For more information about how to add an account to a DataWorks workspace, see Overview of users, roles, and permissions.
Manage data permissions
You can manage the services in an EMR cluster to isolate data permissions. For example, you can use EMR Ranger to manage the permissions of an EMR cluster account that maps to an Alibaba Cloud account.
Permission management at the DataWorks side
Register an EMR cluster to DataWorks
Before you run EMR tasks in DataWorks, you must register an EMR cluster to DataWorks. This way, the cluster can be used as a compute engine instance in DataWorks. Only accounts to which the
AliyunEMRFullAccess
policy is attached can be used to perform this operation. For more information about how to attach theAliyunEMRFullAccess
policy to an account, see Overview of users, roles, and permissions.Grant permissions on DataWorks service modules to an account
If you want to run EMR tasks in DataWorks, you must be granted the permissions on DataWorks service modules such as DataStudio, Data Map, Data Quality, and intelligent monitoring. After you obtain the permissions, you can develop EMR tasks, perform O&M operations on the tasks, and monitor the data quality of the tasks. For more information about the permissions on service modules, see Overview of users, roles, and permissions.
Configure account mappings
After you register an EMR cluster to DataWorks in security mode, go to the Cluster Management page in SettingCenter. On this page, configure mappings between the members in a DataWorks workspace and the accounts of the EMR cluster registered to DataWorks. This way, the members in the DataWorks workspace have the same permissions as the mapped accounts.
NoteFor more information about how to register an EMR cluster to DataWorks and configure the mappings, see Configure DataWorks.