All Products
Search
Document Center

DataWorks:Appendix: Service-linked roles used by DataWorks to access Alibaba Cloud services to which compute engines belong

Last Updated:Aug 02, 2024

If you want to use a compute engine, such as a MaxCompute or an E-MapReduce (EMR) compute engine, in DataWorks, you must authorize DataWorks to access the Alibaba Cloud service to which the compute engine belongs. After the authorization is complete, the system creates a service-linked role for the Alibaba Cloud service to which the related compute engine belongs. This topic describes the service-linked roles that are automatically created when you authorize DataWorks to access the Alibaba Cloud services to which compute engines belong, and the policies that are attached to the roles.

Background information

If you want to perform compute engine-related operations in the DataWorks console, such as associating a compute engine with a workspace or modifying an existing compute engine instance, the system prompts you to perform authorization operations for DataWorks. After the authorization is complete, the system creates a service-linked role for the Alibaba Cloud service to which the related compute engine belongs.

Note
  • Only an Alibaba Cloud account or a RAM user to which the AliyunDataWorksFullAccess policy is attached can authorize DataWorks to perform operations related to compute engines. If you want to perform operations related to compute engines as a RAM user, you must make sure that the AliyunDataWorksFullAccess policy is attached to the RAM user. For information about how to grant permissions to a RAM user, see Grant permissions to a RAM user.

  • If you want to perform the following operations, you must authorize DataWorks to access the Alibaba Cloud service to which a compute engine belongs: create and manage data sources.

  • You can log on to the RAM console and go to the Roles page to search for a service-linked role that is created for the Alibaba Cloud service to which a compute engine belongs and view information about the service-linked role. For more information about service-linked roles, see Service-linked roles.

The following table describes the service-linked roles that can be automatically created based on authorization.

Role name

Role permission

References

AliyunServiceRoleForDataworksEngine

Allows DataWorks to access MaxCompute.

Role 1: AliyunServiceRoleForDataworksEngine

AliyunServiceRoleForDataworksOnEmr

Obtains metadata information of an EMR DataLake cluster and previews related data records in Data Map.

Role 2: AliyunServiceRoleForDataworksOnEmr

AliyunServiceRoleForDataWorks

Obtains and modifies the network configurations of virtual private clouds (VPCs) and the configurations of security groups, and establishes network connections between DataWorks exclusive resource groups and data sources.

DataWorks service-linked role

AliyunServiceRoleForDataWorksDI

Allows Data Integration to obtain RAM roles and assume a custom RAM role to access a data source.

Description of the AliyunServiceRoleForDataWorksDI role

AliyunDIDefaultRole

Allows DataWorks to access resources of other Alibaba Cloud services activated by the current account during data source configuration, node configuration, and data synchronization. The services include ApsaraDB RDS, ApsaraDB for Redis, ApsaraDB for MongoDB, PolarDB-X, HybridDB for MySQL, AnalyticDB for PostgreSQL, PolarDB, Data Management (DMS), and Data Lake Formation (DLF).

Description of the AliyunDIDefaultRole role

AliyunServiceRoleForDataWorksOpenPlatform

Accesses and modifies events in EventBridge and supports message event capabilities in DataWorks Open Platform.

Appendix: DataWorks service-linked role

AliyunServiceRoleForDataWorksAccessDLF

Allows DataWorks to access metadata information of DLF, grants permissions on metadata to users, and revokes permissions on metadata from users. This role is used to implement application and request processing for permissions on DLF metadata in Security Center.

Appendix: Service-linked role used by DataWorks to access DLF

The following sections describe the service-linked roles related to MaxCompute compute engines and EMR DataLake clusters.

Role 1: AliyunServiceRoleForDataworksEngine

  • Role name: AliyunServiceRoleForDataworksEngine

  • Role permissions: Authorizes DataWorks to access MaxCompute.

  • Policy attached to the role: AliyunServiceRolePolicyForDataworksEngine

  • Policy document:

    {
      "Version": "1",
      "Statement": [
        {
          "Action": "odps:*",
          "Effect": "Allow",
          "Resource": "*"
        },
        {
          "Action": [
            "pai:*",
            "paiplugin:*",
            "eas:*"
          ],
          "Resource": "*",
          "Effect": "Allow"
        },
        {
          "Action": "ram:DeleteServiceLinkedRole",
          "Resource": "*",
          "Effect": "Allow",
          "Condition": {
            "StringEquals": {
              "ram:ServiceName": "engine.dataworks.aliyuncs.com"
            }
          }
        }
      ]
    }

Role 2: AliyunServiceRoleForDataworksOnEmr

Important

Do not modify or delete the service-linked role that is automatically created based on authorization and the policy that is attached to the role. Otherwise, you cannot use EMR features in DataWorks.

  • Role name: AliyunServiceRoleForDataworksOnEmr

  • Role permissions: Previews data records in Data Map, and obtains metadata information of an EMR DataLake cluster that uses DLF for metadata management and the configurations of the EMR DataLake cluster.

  • Policy attached to the role: AliyunServiceRolePolicyForDataworksOnEmr

  • Policy document:

    • Permissions to access EMR

      {
          "Version": "1",
          "Statement": [
              {
                "Action": [
                    "emr:GetCluster",
                    "emr:GetOnKubeCluster",
                    "emr:GetClusterClientMeta",
                    "emr:GetApplicationConfigFile",
                    "emr:ListClusters",
                    "emr:ListNodes",
                    "emr:ListNodeGroups",
                    "emr:ListApplications",
                    "emr:ListApplicationConfigs",
                    "emr:ListApplicationConfigFiles",
                    "emr:ListApplicationLinks",
                    "emr:ListComponentInstances"
                  ],
                  "Resource": "*",
                  "Effect": "Allow"
              }
          ]
      }
    • Permissions to access DLF

      If the EMR DataLake cluster that you want to access uses DLF to manage metadata, the policy attached to the service-linked role also contains the following access permissions on DLF. The permissions allow DataWorks to obtain metadata information of the EMR DataLake cluster.

      {
        "Action": [
          "dlf:SubmitQuery",
          "dlf:GetQueryResult",
          "dlf:GetTable",
          "dlf:ListDatabases",
          "dlf:GetTableProfile",
          "dlf:GetCatalogSettings",
          "dlf:BatchGrantPermissions",
          "dlf:ListPartitionsByFilter",
          "dlf:ListPartitions"
        ],
        "Resource": "*",
        "Effect": "Allow"
      }
    • Permissions to access Container Service for Kubernetes (ACK)

      If you want to access an EMR on ACK cluster, the policy attached to the role also contains the following access permissions on ACK:

      {
        "Action": [
          "cs:DescribeUserPermission",
          "cs:DescribeClusterDetail",
          "cs:DescribeClusterUserKubeconfig",
          "cs:GetClusters",
          "cs:GrantPermissions",
          "cs:RevokeK8sClusterKubeConfig"
        ],
        "Resource": "*",
        "Effect": "Allow"
      }
    • Permissions to access Serverless Spark

      If you want to access an EMR Serverless Spark cluster, the policy attached to the role also contains the following access permissions on Serverless Spark:

      {
        "Effect": "Allow",
        "Action": [
          "emr-serverless-spark:TerminateSqlStatement",
          "emr-serverless-spark:CreateSqlStatement",
          "emr-serverless-spark:GetSqlStatement",
          "emr-serverless-spark:TerminateSqlStatement",
          "emr-serverless-spark:ListSessionClusters",
          "emr-serverless-spark:ListWorkspaces",
          "emr-serverless-spark:ListWorkspaceQueues",
          "emr-serverless-spark:ListReleaseVersions",
          "emr-serverless-spark:CancelJobRun",
          "emr-serverless-spark:ListJobRuns",
          "emr-serverless-spark:GetJobRun",
          "emr-serverless-spark:StartJobRun",
          "emr-serverless-spark:AddMembers",
          "emr-serverless-spark:GrantRoleToUsers"
          ],
        "Resource": "*"
      }

      The policy attached to the role also contains the following access permissions on Object Storage Service (OSS) when you upload SQL files or JAR packages or save ad hoc query results:

      {
        "Action": [
          "oss:PutObject",
          "oss:GetObject",
          "oss:DeleteObject"
          ],
        "Resource": [
          "acs:oss:*:*:*/.dataworks/*",
          "acs:oss:*:*:*/.dlsdata/*"
          ],
        "Effect": "Allow"
      },
      {
        "Action": "oss:PostDataLakeStorageFileOperation",
        "Resource": "*",
        "Effect": "Allow"
      }