All Products
Search
Document Center

Platform For AI:Associate a RAM role with a DLC job

Last Updated:Oct 25, 2024

If you want to access other Alibaba Cloud services in a Deep Learning Containers (DLC) job, you must configure an AccessKey pair for identity authentication. If you associate a RAM role with the DLC job, you can access other Alibaba Cloud services in the DLC job by using a temporary access credential provided by Security Token Service (STS) without the need to configure the AccessKey pair. This ensures the security of AccessKey pairs. This topic describes how to create a RAM role and associate the RAM role with a DLC job. This topic also describes how to obtain a temporary access credential provided by STS by using the RAM role.

Benefits

You can use a RAM role whose trusted entity is an Alibaba Cloud service. The Alibaba Cloud service can assume the RAM role to implement cross-service access. You can obtain a temporary access credential by using the RAM role to implement identity authentication and access control. This method has the following benefits:

  • Security and confidentiality: You do not need to manage credentials in a DLC job. You can use a temporary access credential provided by STS instead of an AccessKey pair to reduce the risk of AccessKey pair leaks.

  • Convenient management: You can modify the policy attached to the RAM role associated with a DLC job to manage the access permissions of each developer on Alibaba Cloud services in the DLC job in a more convenient and fine-grained manner.

Limits

A DLC job can be associated with only one RAM role.

Configuration method

Associate a RAM role with a DLC job when you create the DLC job, and obtain a temporary access credential provided by STS by using the RAM role.

Associate a RAM role with a DLC job

Scenario 1: Associate the default role of PAI to a DLC job

The default role of Platform for AI (PAI) is a RAM role to which the normal service role AliyunPAIDLCDefaultRole is assigned. The default role has access permissions only on MaxCompute and Object Storage Service (OSS) and supports fine-grained access control. When you access MaxCompute tables, a temporary access credential provided by using the default role of PAI has the same permissions as the owner of a DLC instance. When you access OSS, a temporary access credential can be used to access only the default OSS bucket configured for the current workspace.

If you associate the default role with a DLC job, you can obtain a temporary access credential to access basic development resources in the DLC job without the need to create another RAM role.

  • Use scenarios

    After you associate the default role of PAI with a DLC job, you do not need to configure an AccessKey pair in the following scenarios:

    • Use MaxCompute SDK to submit a job to a MaxCompute project on which the job owner has the execution permissions.

    • Use OSS SDK to access data in the default OSS bucket configured for the current workspace. For more information about how to configure a default OSS storage path of the current workspace, see Configure the default storage path of a workspace.

  • Configuration method

    On the Create Job page, select Default Roles of PAI for the Instance RAM Role parameter in the Role Information section. For more information, see Submit training jobs.image

After you associate the RAM role with the DLC job, you must obtain a temporary access credential by using the RAM role.

Scenario 2: Associate a custom role with a DLC job

If the permissions of a temporary access credential that you obtain by using the default role of PAI cannot meet your requirements, you can create a RAM role and grant permissions to the RAM role to control the range of Alibaba Cloud resources that developers can access in the job. Perform the following steps:

  1. Log on to the RAM console and create a RAM role. For more information, see Create a RAM role for a trusted Alibaba Cloud service.

    Take note of the following key parameters:

    • Select Trusted Entity: Select Alibaba Cloud Service.

    • Role Type: Select Normal Service Role.

    • Select Trusted Service: Select Platform for AI.

  2. Grant permissions to the RAM role.

    You can attach a system policy or a custom policy to the RAM role. This way, the RAM role can access or manage related resources. For more information, see the "Step 3: Grant permissions to a RAM role" section in the Create a RAM role and attach the required policies to the role topic. For example, you can attach the AliyunOSSReadOnlyAccess policy to the RAM role.

    If you use a RAM user, contact the owner of the Alibaba Cloud account to grant the current RAM user the permissions to use the RAM role. For more information, see Grant permissions to a RAM user. Sample policy document:

    {
      "Version": "1",
      "Statement": [
        {
          "Effect": "Allow",
          "Action": "ram:PassRole",
          "Resource": "acs:ram::*:role/${RoleName}"
        }
      ]
    }

    Replace ${RoleName} in the preceding sample policy document with the name of the RAM role that you want to associate with the DLC job.

  3. Associate the RAM role with the DLC job and submit the DLC job. You need to configure only the following key parameters in the Role Information section. For information about other parameters, see Submit training jobs.image

    Parameter

    Description

    Instance RAM Role

    Select Custom Roles.

    RAM Role

    Select the RAM role that you created in Step 1. After you associate the RAM role with the DLC job, you have the permissions of the RAM role to access other Alibaba Cloud services in the DLC job by using a temporary access credential provided by STS.

After you associate the RAM role with the DLC job, you must obtain a temporary access credential by using the RAM role.

Scenario 3: Do not associate a RAM role with a DLC job

If you do not need to use an AccessKey pair to access data, we recommend that you do not associate a RAM role with a DLC job. When you create a DLC job, select Does Not Associate Role for the Instance RAM Role parameter in the Role Information section. For more information, see Submit training jobs.image

Obtain a temporary access credential by using the RAM role associated with a DLC job

When you create a DLC job, if you associate the DLC job with the default role of PAI or a custom role, obtain a temporary access credential by using the following methods in a convenient manner:

Method 1: Use the Alibaba Cloud Credentials tool

The Alibaba Cloud Credentials tool calls the local service that is automatically injected when you create a DLC job to obtain a temporary access credential provided by STS. This credential is updated on a regular basis.

When you create a DLC job, complete the following key configurations. For more information, see Submit training jobs.

  • Install the Alibaba Cloud Credentials tool.

    On the Create Job page, select Select from List for the Third-party Libraries parameter and enter alibabacloud_credentials in the Third-party Libraries field to install the Alibaba Cloud Credentials tool.

    Note

    If the third-party library is pre-installed in the image, you can skip this configuration.

  • Configure a script file.

    In this example, a Python script file is used. For more information about sample code of SDKs for other programming languages, see Sample code. You can select Online configuration for the Code Builds parameter, or select Local Upload to upload a script file from your on-premises machine to the DLC environment.

    from alibabacloud_credentials.client import Client as CredClient
    from alibabacloud_credentials.models import Config as CredConfig
    
    credentialsConfig = CredConfig(
        type='credentials_uri'
    )
    credentialsClient = CredClient(credentialsConfig)
    

Method 2: Access the local service of the DLC job

When you create a DLC job, you can set the Startup Command parameter to the following command. This way, you can access the local service that is automatically injected into the DLC job to obtain a temporary access credential. For more information, see Submit training jobs.

# Obtain a temporary access credential for the RAM role of an instance.
curl $ALIBABA_CLOUD_CREDENTIALS_URI

The following output is returned:

{
	"Code": "Success",
	"AccessKeyId": "STS.N*********7",
	"AccessKeySecret": "3***************d",
	"SecurityToken": "DFE32G*******"
	"Expiration": "2024-05-21T10:39:29Z"
}

In the output, take note of the following parameters:

  • SecurityToken: the temporary access credential of the RAM role.

  • Expiration: the expiration time of the temporary access credential for the RAM role.

Method 3: Access the local file of the DLC job

Access the file in the specified path of the DLC container to obtain the temporary access credential by using the RAM role. The file is automatically injected by PAI and refreshed on a regular basis. The path of the file is /mnt/.alibabacloud/credentials. The following sample code provides an example of the file content:

{
	"AccessKeyId": "STS.N*********7",
	"AccessKeySecret": "3***************d",
	"SecurityToken": "DFE32G*******"
	"Expiration": "2024-05-21T10:39:29Z"
}

Examples

Example 1: Access MaxCompute by using a RAM role associated with a DLC job

When you create a DLC job, complete the following key configurations. For more information, see Submit training jobs.

  • Install the Alibaba Cloud Credentials tool.

    Set the Third-party Libraries parameter to Select from List and enter the following third-party libraries to install Alibaba Cloud Credentials and MaxCompute SDK.

    alibabacloud_credentials
    pyodps
    Note

    If the third-party libraries are pre-installed in the image, you can skip this configuration.

  • Configure a script file.

    In this example, a Python script file is used. You can select Online configuration for the Code Builds parameter, or select Local Upload to upload a script file from your on-premises machine to the DLC environment. Then, configure a Mount Path such as /mnt/data/.

    from alibabacloud_credentials import providers
    from odps.accounts import CredentialProviderAccount
    from odps import ODPS
    
    if __name__ == '__main__':
        account = CredentialProviderAccount(providers.DefaultCredentialsProvider())
        o = ODPS(
            account=account,
            project="{odps_project}",  # Replace {odps_project} with the name of your project.
            endpoint="{odps_endpoint}"  # Replace {odps_endpoint} with the endpoint of the region where your project resides.
        )
    
        for t in o.list_tables():
            print(t)
    
  • Configure a startup command

    Set Startup Command to the command that runs the script. For example, python /mnt/data/xx.py.

  • Configure Role Information

    Select Default Roles of PAI for Instance RAM Role.

Example 2: Access OSS by using a RAM role associated with a DLC job

When you create a DLC job, complete the following key configurations. For more information, see Submit training jobs.

  • Install the Alibaba Cloud Credentials tool.

    Set the Third-party Libraries parameter to Select from List and enter the following third-party libraries to install Alibaba Cloud Credentials and OSS SDK.

    alibabacloud_credentials
    oss2
    Note

    If the third-party libraries are pre-installed in the image, you can skip this configuration.

  • Configure a script file.

    In this example, a Python script file is used. You can select Online configuration for the Code Builds parameter, or select Local Upload to upload a script file from your on-premises machine to the DLC environment. Then, configure a Mount Path such as /mnt/data/.

    import oss2
    from alibabacloud_credentials.client import Client
    from alibabacloud_credentials import providers
    from itertools import islice
    
    auth = oss2.ProviderAuth(providers.DefaultCredentialsProvider())
    bucket = oss2.Bucket(auth,
                         '{oss_endpoint}',  # Replace {oss_endpoint} with the endpoint of the region where your OSS bucket resides.
                         '{oss_bucket}'  # Replace {oss_bucket} with the name of your OSS bucket.
             )
    for b in islice(oss2.ObjectIterator(bucket), 10):
        print(b.key)
    
  • Configure a startup command

    Set Startup Command to the command that runs the script. For example, python /mnt/data/xx.py.

  • Configure Role Information

    Select Default Roles of PAI for Instance RAM Role.

FAQ

What do I do if an error occurs when I associate a custom role with a DLC job during job creation?

  • The error message is check permission for ram role failed or check permission for sub user failed.

    To resolve this issue, log on to the RAM console to check whether the RAM role exists.

    • If the RAM role does not exist, change the RAM role to an existing role.

    • If the RAM role exists, contact the owner of the Alibaba Cloud account to grant the current RAM user the permissions to use the RAM role. For more information, see Grant permissions to a RAM user. The following sample code shows the policy document. You must replace ${RoleName} with the name of the RAM role.

      {
        "Version": "1",
        "Statement": [
          {
            "Effect": "Allow",
            "Action": "ram:PassRole",
            "Resource": "acs:ram::*:role/${RoleName}"
          }
        ]
      }
  • The error message is Failed to assume role for user.

    In most cases, this error occurs because no trust policy is configured for the RAM role. To configure a trust policy for the RAM role, perform the following steps:

    1. Log on to the RAM console as a RAM user who has administrative rights.

    2. In the left-side navigation pane, choose Identities > Roles.

    3. On the Roles page, click the name of the RAM role that you created.

    4. On the Trust Policy tab, click Edit Trust Policy.

    5. Modify the content of the trust policy and click Save trust policy document.

      The following sample code shows the original policy document of the RAM role:

      {
        "Statement": [
          {
            "Action": "sts:AssumeRole",
            "Effect": "Allow",
            "Principal": {
              "RAM": [
                "acs:ram::aaa:root"
              ],
              "Service": [
                "xxx.aliyuncs.com"
              ]
            }
          }
        ],
        "Version": "1"
      }

      The following sample code shows the new policy document of the RAM role:

      {
        "Statement": [
          {
            "Action": "sts:AssumeRole",
            "Effect": "Allow",
            "Principal": {
              "RAM": [
                "acs:ram::aaa:root"
              ],
              "Service": [
                "xxx.aliyuncs.com",
                "pai.aliyuncs.com" 
              ]
            }
          }
        ],
        "Version": "1"
      }