All Products
Search
Document Center

DataWorks:Use the RAM role-based authorization mode to add a data source

Last Updated:Nov 18, 2024

This topic describes how to use the RAM role-based authorization mode to add a data source to improve the security of data in the cloud. In this topic, an Object Storage Service (OSS) data source is used.

Prerequisites

If you want to log on to the DataWorks console and perform the operations that are described in this topic as a RAM user, make sure that the AliyunDataWorksFullAccess and AliyunRAMFullAccess policies are attached to the RAM user. For more information, see Grant permissions to a RAM user.

Note

If you want to use an Alibaba Cloud account to log on to the DataWorks console and perform the operations, ignore the prerequisites.

The following figure shows how to attach a policy to a RAM user.

image

Background information

Data is synchronized based on data sources. Therefore, data sources are crucial to ensure the security of enterprise data in the cloud. DataWorks allows you to use the more secure RAM role-based authorization mode to add and access data sources, such as OSS, AnalyticDB for MySQL 2.0, LogHub, Tablestore, and Hologres data sources. This improves the security of data in the cloud and prevents inappropriate use of data sources and leakage of AccessKey pairs.

You can use the AccessKey pair-based authorization mode or the RAM role-based authorization mode to add a data source. In this topic, the RAM role-based authorization mode is used. You can select a mode based on your business requirements. The following descriptions provide the working principles of the AccessKey pair-based authorization mode and the RAM role-based authorization mode:

  • AccessKey pair-based authorization mode

    The AccessKey pair-based authorization mode is less secure than the RAM role-based authorization mode. In AccessKey pair-based authorization mode, you need to only specify the AccessKey pair of your Alibaba Cloud account or RAM user when you add a data source.

    The following figure shows the parameters that are required to add an OSS data source in the AccessKey pair-based authorization mode. In the Add OSS Data Source dialog box, you must set the AccessKey ID and AccessKey Secret parameters to the AccessKey pair of the account that has permissions to access an OSS bucket.配置数据源

    When a synchronization task for the OSS data source is run or scheduled, DataWorks uses the AccessKey pair to access the data source and read data from or write data to the data source.

    Note

    In AccessKey pair-based authorization mode, OSS data may be leaked if your AccessKey pair is leaked.

  • RAM role-based authorization mode

    The RAM role-based authorization mode is more secure than the AccessKey pair-based authorization mode. In RAM role-based authorization mode, AccessKey pairs are not required. This prevents your AccessKey pair from being leaked.

    In RAM role-based authorization mode, you can authorize the DataWorks service account to assume a RAM role to access OSS without using AccessKey pairs.

    In addition, you can create different roles for different data sources based on your business requirements. This allows you to manage permissions in a fine-grained manner.

Process

This section describes the overall process of using the RAM role-based authorization mode to add a data source with an Alibaba Cloud account or as a RAM user.

  1. Use your Alibaba Cloud account or a RAM user to which the AliyunRAMFullAccess policy is attached to log on to the RAM console. Then, create a role to be assumed and a policy to be attached.

    • Role to be assumed: a custom role to be assumed by the DataWorks service account. After the DataWorks service account assumes the role, you can use the DataWorks service account to access OSS based on the permissions that are granted to the role.

    • Policy to be attached: a policy that contains the PassRole permission. After a RAM user is attached the policy, the RAM user can assume the custom role to add a data source or run a synchronization task for the data source.

  2. Use your Alibaba Cloud account or a RAM user to which the AliyunRAMFullAccess policy is attached to log on to the RAM console. Then, grant permissions to the RAM user that you want to use in Steps 3 and 5.

    Note

    In RAM role-based authorization mode, if you use an unauthorized RAM user to add a data source, all synchronization tasks for the data source fail to run.

  3. Log on to the DataWorks console by using an Alibaba Cloud account or as a RAM user, and go to the Data Integration page to add a data source in the RAM role-based authorization mode. When the synchronization task for the data source runs, you can assume the created RAM role to access the data source.

    Note

    A RAM user can be used to perform operations in this step only after the RAM user is granted the required permissions in Step 2.

  4. Go to the DataStudio page by using the Alibaba Cloud account or as the RAM user and create a data synchronization task for the data source that you added.

  5. On the DataStudio or Operation Center page, run the data synchronization task.

    Note

    A RAM user can be used to perform operations in this step only after the RAM user is granted the required permissions in Step 2.

Procedure

  1. Create a role to be assumed and a policy to be attached and attach the policy to the role.

    You can create different custom roles for different data sources based on your security requirements. The following scenario is used as an example on how to create a role to be assumed:

    Note

    Only an Alibaba Cloud account or a RAM user to which the AliyunRAMFullAccess policy is attached can be used to perform the operations in this step.

    An enterprise uses 100 OSS buckets to store all data, and the big data team needs to use data that is stored in a specific OSS bucket. If the preset role AliyunDataWorksAccessingOSSRole is used, data in the other 99 OSS buckets may be accessed by the big data team. This may cause data leakage in the buckets.

    In this case, the owner of an Alibaba Cloud account can create a custom role named BigDataOSSRole for the big data team and allows only the members of the big data team to use the role. This helps isolate permissions across teams.

    1. Create a custom role.

      In this example, a custom role whose trusted entity is an Alibaba Cloud account and whose name is BigDataOssRole is created. For more information about how to create a custom role, see Create a RAM role for a trusted Alibaba Cloud account.

    2. Create a custom policy.

      In this example, a policy that allows users to read data from and write data to specified buckets is created. For more information about how to create a custom policy, see Create custom policies. The following code shows the document of the policy:

      {
          "Version": "1",
          "Statement": [
              {
                  "Effect": "Allow",
                  "Action": [
                      "oss:GetObject",
                      "oss:ListObjects",
                      "oss:GetObjectMetadata",
                      "oss:GetObjectMeta",
                      "oss:GetBucketAcl",
                      "oss:GetBucketInfo",
                      "oss:PutObject",
                      "oss:DeleteObject",
                      "oss:PutBucket"
                  ],
                  "Resource": [
                      "acs:oss:*:*:bucket_name_1",
                      "acs:oss:*:*:bucket_name_1/*"
                  ]
              }
          ]
      }
    3. Attach the policy to the BigDataOSSRole role.

      Modify the trust policy of the BigDataOSSRole role. Attach the created policy to the BigDataOSSRole role. This way, the user that is assigned the BigDataOSSRole role can read data from and write data to the two specified buckets.

      Important

      To use the role, you must perform the operations in this step.

      For more information about how to modify the trust policy of a role, see Edit the trust policy of a RAM role. The following code shows the document of the policy:

      {
          "Statement": [
              {
                  "Action": "sts:AssumeRole",
                  "Effect": "Allow",
                  "Principal": {
                      "Service": [
                          "di.dataworks.aliyuncs.com"
                      ]
                  }
              }
          ],
          "Version": "1"
      }
  2. Allow specific users to assume the role.

    After you determine the role to be assumed, you must attach a policy that contains the PassRole permission to specific users. This way, the users can assume the role to add a data source and run a synchronization task for the data source. You can also establish mappings between users and roles based on your business requirements.

    • Policy template 1: You can create a policy based on the following template. The policy allows authorized users to assume all roles that are related to DataWorks Data Integration. Create a policy by using the template only if it is necessary for your business.

      {
          "Version": "1",
          "Statement": [
              {
                  "Action": "ram:PassRole",
                  "Resource": "*",
                  "Effect": "Allow",
                  "Condition": {
                      "StringEquals": {
                          "acs:Service": "di.dataworks.aliyuncs.com"
                      }
                  }
              }
          ]
      }
    • Policy template 2: You can also create a custom policy that contains the PassRole permission. Then, you can establish mappings between users and roles based on your business requirements.

      Note

      Only an Alibaba Cloud account or a RAM user to which the AliyunRAMFullAccess policy is attached can be used to perform the operations in this step.

      In this example, after you create the BigDataOSSRole role for the big data team, you must use the following policy to allow specific users to assume the role based on your business requirements. You can create a custom policy named BigDataOSSRoleAllowUse and attach the policy to specific users. This way, the users can assume the BigDataOSSRole role.

      Create a policy named BigDataOssRoleAllowUse. For more information, see Create custom policies. The following code shows the document of the policy:

      {
          "Version": "1",
          "Statement": [
              {
                  "Action": "ram:PassRole",
                  "Resource": "acs:ram::19122324****:role/BigDataOssRole",
                  "Effect": "Allow",
                  "Condition": {
                      "StringEquals": {
                          "acs:Service": [
                              "oss.aliyuncs.com",
                              "di.dataworks.aliyuncs.com"
                          ]
                      }
                  }
              }
          ]
      }
      Note

      Replace the UID 19122324**** in the preceding code with the UID of your Alibaba Cloud account.

      After you create the BigDataOssRoleAllowUse policy, you can attach the policy to the RAM users who want to assume the BigDataOssRole role. This way, the RAM users can assume the BigDataOssRole role as the access identity to add data sources and run synchronization tasks for the data sources.

  3. Add a data source.

    After you are granted the required permissions by the owner of an Alibaba Cloud account, you can add a data source.

    1. Use your Alibaba Cloud account or a RAM user to which the DataWorksFullAccess policy is attached to add an OSS data source.

      In the Add OSS Data Source dialog box, select RAM Role Authorization Mode for Access Mode and configure other parameters based on your business requirements. The following table describes the parameters. If you use a workspace in standard mode, you can determine whether to add the data source in the development or production environment.

      Note

      In this example, an OSS data source is added. The parameters that you need to configure vary based on the data source type. For more information about how to add an OSS data source, see Add an OSS data source.

      image

      Parameter

      Description

      Data Source Name

      The name of the data source. The name can contain only letters, digits, and underscores (_), and must start with a letter.

      Data Source Description

      The description of the data source. The description cannot exceed 80 characters in length.

      Region

      Select a region from the drop-down list.

      Note

      Endpoint

      The endpoint of OSS. Endpoint format: http://oss.aliyuncs.com. The endpoint of OSS varies based on the region.

      Note

      If you add a bucket name before the endpoint of OSS and a period (.) after the bucket name, the data source can pass the connectivity test, but data synchronization will fail. For example, you cannot set this parameter to http://xxx.oss.aliyuncs.com.

      Bucket

      The name of the OSS bucket. A bucket is a container that is used to store objects in OSS.

      You can create one or more buckets and add one or more objects to each bucket.

      During data synchronization, DataWorks can search for objects only in the bucket that is specified by this parameter.

      Access Mode

      The mode that is used to access the data source. In this example, RAM Role Authorization Mode is used. Then, the DataWorks service account can assume the related role to access the data source by using an STS token. This ensures higher security.

      Role

      The role to be assumed. Select a RAM role from the drop-down list.

    2. Test the network connectivity.

      In the Connection Configuration section, click the Data Integration tab. Then, find the required resource group and click Test Network Connectivity in the desired column.

      A synchronization task can use only one type of resource group. To ensure that your synchronization tasks can be run as expected, you must test the connectivity between the resource groups for Data Integration on which your synchronization tasks are run and the data sources. If you want to test the connectivity of multiple resource groups for Data Integration at a time, select the resource groups and click Batch Test Network Connectivity. For more information, see Network connectivity solutions.

    3. If the connectivity test is successful, click Complete Creation.

  4. Create a data synchronization task.

    After you add a data source, you can go to the DataStudio page and create a data synchronization task for the data source. For more information, see Configure a batch synchronization task by using the codeless UI.

  5. Run the data synchronization task.

    On the DataStudio or Operation Center page, run the created data synchronization task.

    Note

    Make sure that you are granted the required permissions in Step 2 before you run tasks on the DataStudio page. Otherwise, the tasks fail to run.