All Products
Search
Document Center

DataWorks:Permission management for running EMR tasks in DataWorks

Last Updated:Jul 17, 2024

Before you run E-MapReduce (EMR) tasks in DataWorks, you must complete authentication and authorization configurations at the EMR and DataWorks sides to ensure that the tasks can be run as expected. This topic describes how to manage permissions on DataWorks and EMR.

Background information

In DataWorks, you can configure mappings between the members in a workspace and the accounts of the EMR cluster associated with the workspace to obtain the permissions on the EMR cluster. This way, Alibaba Cloud accounts, task owners, or RAM users have different permissions on data when they run EMR tasks in DataWorks, and data permissions are isolated. For more information about permission configurations that are required to run EMR tasks in DataWorks, see the Permission management at the EMR side and Permission management at the DataWorks side sections in this topic.

Limits

DataWorks allows you to use only the system account or OpenLDAP account to configure mappings between members in a workspace and the accounts of the EMR cluster that is registered to DataWorks as a compute engine instance. When you configure the mappings, take note of the following items:

  • You can configure mappings only at the cluster level. Only one authentication method can be used.

  • The EMR cluster accounts and passwords in the mappings must be the same as the actual accounts and passwords of the EMR cluster registered to DataWorks.

If the EMR cluster accounts and passwords in the mappings are inconsistent with the actual accounts and passwords, or authentication is not enabled for the cluster, EMR tasks fail to be run in DataWorks. The following table describes the details.

Value of the Mapping Type parameter

Description

Mapping to System Account

If the accounts or passwords are inconsistent, EMR tasks fail to be run in DataWorks.

Mapping to OpenLDAP Account

In the following scenarios, EMR tasks fail to be run in DataWorks:

  • LDAP authentication is enabled for the desired cluster but no account mapping is configured in DataWorks.

  • LDAP authentication is enabled for DataWorks but is disabled for the desired service in the EMR cluster.

    Note

    If you use an OpenLDAP account to configure the mappings, SQL tasks such as Hive, Impala, and Presto tasks use this account for authentication by default. In this case, if LDAP authentication is not enabled for the desired service in the EMR cluster, the SQL tasks fail to be run.

Note

The authentication method for EMR clusters varies based on the compute engine. You can check whether an EMR cluster supports LDAP authentication in the EMR console.

Permission management at the EMR side

  • Enable LDAP authentication

    If you want to use a non-system account for identity authentication in an EMR cluster, you must enable LDAP authentication for the cluster and add the account that is used to develop EMR tasks in DataWorks to LDAP users. In this case, you must perform the following steps:

    1. Enable LDAP authentication for the cluster.

      To use LDAP for identity authentication, you must enable LDAP authentication for the cluster. For more information, see Enable LDAP authentication.

    2. Prepare the account that is used to run EMR tasks and add the account to LDAP users and the related DataWorks workspace.

      We recommend that you add users who need to create, test, commit, and deploy EMR tasks in DataStudio to LDAP users and the related DataWorks workspace. For more information about how to add an account to a DataWorks workspace, see Overview of users, roles, and permissions.

  • Manage data permissions

    You can manage the services in an EMR cluster to isolate data permissions. For example, you can use EMR Ranger to manage the permissions of an EMR cluster account that maps to an Alibaba Cloud account.

Permission management at the DataWorks side

  • Register an EMR cluster to DataWorks

    Before you run EMR tasks in DataWorks, you must register an EMR cluster to DataWorks. This way, the cluster can be used as a compute engine instance in DataWorks. Only accounts to which the AliyunEMRFullAccess policy is attached can be used to perform this operation. For more information about how to attach the AliyunEMRFullAccess policy to an account, see Overview of users, roles, and permissions.

  • Grant permissions on DataWorks service modules to an account

    If you want to run EMR tasks in DataWorks, you must be granted the permissions on DataWorks service modules such as DataStudio, Data Map, Data Quality, and intelligent monitoring. After you obtain the permissions, you can develop EMR tasks, perform O&M operations on the tasks, and monitor the data quality of the tasks. For more information about the permissions on service modules, see Overview of users, roles, and permissions.

  • Configure account mappings

    After you register an EMR cluster to DataWorks in security mode, go to the Cluster Management page in SettingCenter. On this page, configure mappings between the members in a DataWorks workspace and the accounts of the EMR cluster registered to DataWorks. This way, the members in the DataWorks workspace have the same permissions as the mapped accounts.

    Note

    For more information about how to register an EMR cluster to DataWorks and configure the mappings, see Configure DataWorks.