Create and manage a workspace

Updated at: 2025-03-10 09:01

Workspace is a key concept in Platform for AI (PAI). Workspaces allow enterprises and teams to manage computing resources and user permissions in a centralized manner. Workspaces provide AI developers with development tools that allow different teams to collaborate throughout the entire workflow of AI development and that allow the developers to manage AI computing assets. This topic describes how to create, configure, and manage a workspace.

Prerequisites

PAI is activated. For more information, see Activate PAI and create a default workspace.

Limits

  • Only the administrator or owner of a workspace can modify workspace configurations.

  • The voice call, text message, and email methods for event notifications are supported only in the China (Hangzhou), China (Shanghai), and China (Ulanqab) regions.

Account and permission requirements

  • Alibaba Cloud account: You can use an Alibaba Cloud account to perform all operations without additional authorization.

  • RAM user: You must attach the AliyunPAIFullAccess policy to the RAM user. AliyunPAIFullAccess allows the RAM user to obtain all permissions of PAI. Proceed with caution. We recommend that you use an Alibaba Cloud account.

Create a workspace

Go to the Workspace Details page of PAI, click Create Workspace, and configure the parameters.

  1. Description of key parameters in the Basic Information step:

    • Add Member: Add members and roles to the workspace. You can skip this parameter and add the members and roles to the workspace after the workspace is created. For more information, see Configure Member and Role.

    • Workspace Default Storage: We recommend that you specify the default storage path for the workspace to store the temporary data and models that are generated during model training. This allows you to manage the data in a unified manner.

  2. Description of key parameters in the Associate Resource step:

    • Intelligent Computing Lingjun resources: provide high-performance computing resource groups for model development and training. The resources offer the advantages of high performance, high efficiency, and high resource utilization.

    • General Computing Resources: provide dedicated general-purpose computing resources for AI development to improve development and training efficiency. For more information, see Create a dedicated resource group and purchase general computing resources.

    • ACS Computing Resources: provide resources for Deep Learning Container (DLC) or Elastic Algorithm Service (EAS) inference to start and schedule tasks or services. For more information, see Clusters.

    • MaxCompute Resources: provide CPU resources for specific algorithms in Machine Learning Designer. For more information, see MaxCompute resource quotas.

    • Fully Managed Flink Resources: provide resources for the training of large-scale distributed models in PAI. For more information, see Flink resource quotas.

    For more information about AI computing resources, see AI computing resources.

  3. Confirm the information and go to the workspace that you created.

    After you go to the workspace details page, you can view all the PAI features in the left-side navigation pane. You can use the features to manage the full lifecycle of AI development based on your business requirements. For more information, see AI development.

    image

Manage a workspace

Go to the Workspace Details page, enter the workspace that you want to manage, and click Configure Workspace in the upper-right corner of the page that appears.

Configure Computing Resource
Configure Member and Role
DataWorks Scheduling Settings
Configure Event Notification
Configure Storage Path
Configure SLS
General Configurations

View and associate computing resources:

Note

You cannot disassociate resources that are associated with the workspace. To disassociate associated resources, contact your account manager.

image

  • Intelligent Computing Lingjun resources: provide high-performance computing resource groups for model development and training. The resources offer the advantages of high performance, high efficiency, and high resource utilization.

  • General Computing Resources: provide dedicated general-purpose computing resources for AI development to improve development and training efficiency. For more information, see Create a dedicated resource group and purchase general computing resources.

  • ACS Computing Resources: provide resources for Deep Learning Container (DLC) or Elastic Algorithm Service (EAS) inference to start and schedule tasks or services. For more information, see Clusters.

  • MaxCompute Resources: provide CPU resources for specific algorithms in Machine Learning Designer. For more information, see MaxCompute resource quotas.

  • Fully Managed Flink Resources: provide resources for the training of large-scale distributed models in PAI. For more information, see Flink resource quotas.

For more information about AI computing resources, see AI computing resources.

If multiple RAM users need to develop, manage, or perform O&M in a workspace, you must add the users as workspace members and assign different roles to the members. PAI provides multiple types of roles. You can view the mappings between roles and permissions and assign different roles to members based on your business requirements.

  • Add a member or a role

    image

    You can select one or more roles for a RAM user based on your business requirements. The following table describes the supported roles.

    Role

    Description

    Basic Role

    Basic Role includes the following roles:

    • Administrator: This role has the permissions to modify members and manage resource groups and all assets in a workspace.

    • Algorithm Developer: This role has the permissions to develop and train models in a workspace.

    • Algorithm O&M Engineer: This role has the permissions to manage job priorities, publish models, and monitor online services.

    • Labeling Administrator: This role has the permissions to use iTAG.

    • Visitor: This role has the read-only permissions on all assets in the workspace.

    Computing Resource Role

    MaxCompute Developer: The developer role in DataWorks. This role has the permissions to develop data in MaxCompute. You can assign this role to the RAM users that you want to use to submit jobs from PAI to MaxCompute.

    Custom Role

    The following figure shows how to create a custom role.

    image

    Permission description:

    • No Permissions: No permissions on a service.

    • Read-only: The permissions to view the resources owned by a specific member and the resources that are visible to all members in a service.

    • Modify/Execute: The permissions to modify and manage the resources owned by a specific member in a service.

    • Full Access: Full management permissions on all resources in a service.

  • Modify the role of a member

    image

    Relationships between members and roles:

    • Each member must be assigned at least one role.

    • You cannot delete the Owner role. An Alibaba Cloud account or RAM user that is used to create a workspace automatically becomes the owner of the workspace. The owner has the permissions to modify the members of the workspace, reference and manage resource groups, and manage all assets in the workspace.

PAI provides a resource management and scheduling mechanism that allows workspace administrators to flexibly schedule resources in a workspace based on business requirements.

image

Note

Non-workspace members in the Role drop-down list refers to members that have not been added to the workspace by the administrator but have been granted the related RAM permissions by the Alibaba Cloud account. Non-workspace members can also use resources and submit jobs. Therefore, you can define separate constraints for these members.

PAI provides a notification mechanism for workspaces. You can create notification rules to monitor the status of DLC jobs and pipeline jobs, or trigger related events based on the approval status of model versions.

  1. (Optional) Grant permissions required to create a notification rule.

    The first time you create a notification rule, you must activate EventBridge and attach the AliyunServiceRoleForPAIWorkspace role to the PAI workspace. Procedure:

    1. Activate EventBridge.

      To facilitate account management, PAI automatically creates a custom event bus that is named in the pai-system-${Workspace name} format for each workspace. You can log on to the EventBridge console, switch to the region where the workspace resides, and manage the custom event bus.

    2. Click Authorize Now and grant the required permissions.

      In this case, the system automatically creates the service-linked role AliyunServiceRoleForPAIWorkspace. For more information about the service-linked role, see Appendix: Service-linked role AliyunServiceRoleForPAIWorkspace.

      image

    3. Use the following code to create a custom policy and attach the policy to the RAM user.

      {
        "Statement": [{
          "Effect": "Allow",
          "Action": [
            "eventbridge:CreateEventBus",
            "eventbridge:GetEventBus",
            "eventbridge:DeleteEventBus",
            "eventbridge:ListEventBuses",
            "eventbridge:CreateRule",
            "eventbridge:GetRule",
            "eventbridge:UpdateRule",
            "eventbridge:EnableRule",
            "eventbridge:DisableRule",
            "eventbridge:DeleteRule",
            "eventbridge:ListRules",
            "eventbridge:PutEvents",
            "eventbridge:UpdateTargets",
            "eventbridge:DeleteTargets",
            "eventbridge:ListTargets"
          ],
          "Resource": "acs:eventbridge:*:*:eventbus/*"
        }],
        "Version": "1"
      }
  2. Create an event notification rule.

    image

    Parameter

    Description

    Event Type

    Supported event types:

    • Pipeline Jobs: Machine Learning Designer pipeline jobs. Valid values in the Event Type drop-down list: Job Failure and Job Completed (Succeeded or Failed).

    • DLC Jobs: DLC jobs. Valid values in the Event Type drop-down list: Job Progress (including Enter Queue, Start Bidding, Start Run, and Job Failure), Automatic Fault Tolerance, Job Timeout (you must configure a timeout rule in DataWorks Scheduling Settings), and Other Events (including Job Preempted and Job Manually Stopped).

    • Models: models registered in AI Computing Asset Management. Valid values in the Event Type drop-down list: Version Approved and Version Status Changed (Approved or Rejected). If you select Version Approved, the system sends an event notification when the model approval status changes from Pending to Approved.

    Event Target

    • DingTalk Notification: You must configure the Webhook and Add Signature parameters. For more information, see the Appendix: Obtain a webhook URL and a key section of this topic. You can click Test Connectivity to check whether the configured content is valid.

    • HTTPS/HTTP: This option is available only when you set Event Type to Models. You must set the URL parameter to the URL of the specified HTTP or HTTPS operation. The system calls a specified HTTP or HTTPS operation when the model version status changes. Note that the specified operation needs to be parsed based on the specification template.

    • Voice Calls: This option is available only when you set Event Type to Pipeline Jobs or DLC Jobs. You must configure contacts. If no contacts are available, you can configure notifications. For more information, see How do I configure notifications?

    • Text Message: The configuration method is the same as that for Voice Calls.

    • Email: The configuration method is the same as that for Voice Calls.

    Important

    By default, the number of event targets in a single event rule is 5. If the quota cannot meet the requirements, you can apply for quotas. We recommend that the requested number of event targets does not exceed 100. Note: When you configure the voice call, text message, and email methods, each contact that you add occupies a quota. Contacts are accumulated without deduplication. For example, if the contacts Alice and Tony are added to the text message method and Alice and Alan are added to the email method, the total quota is 4.

Appendix: Obtain a webhook URL and a key

  1. Find the desired DingTalk group and follow the instructions shown in the following figure to add a DingTalk chatbot.

    机器人

  2. Follow the instructions shown in the following figure to open the Add Robot dialog box.

    image

  3. In the Add Robot dialog box, configure the parameters shown in the following figure, copy the key, and then click Finished.

    Important

    Save the copied key to your computer for later use.

    添加机器人

  4. In the Add Robot dialog box, click Copy on the right of the webhook URL and Finished.

    Important

    Save the webhook URL to your computer for later use.

    添加机器人

The key and webhook URL that you obtained in Steps 3 and 4 are the values for the Additional Signature and Webhook parameters in the Create Event Rule panel on the Configure Event Notification tab.

You must specify a default storage path for the workspace.

image

  • We recommend that you specify the default storage path for the workspace to store the temporary data and models that are generated during model training. This allows you to manage the data in a unified manner.

  • If you also specify a pipeline storage path in Machine Learning Designer, the specified pipeline storage path takes precedence over the default storage path of the workspace when the pipeline is run.

You can send logs that are generated for the Data Science Workshop (DSW) instances and DLC jobs in the current workspace to Simple Log Service for custom analysis.

image

Parameter

Description

SLS Project

The project in Simple Log Service that is used to isolate and manage resources. If no project is available, you can create a project.

LogStore

The Simple Log Service Logstore that is used to collect, store, and query logs. If no Logstore is available, you can create a Logstore.

Modules that require SLS storage

Supported options: Deep Learning Containers (DLC) and Interactive Modeling (DSW).

You can enable and disable features and control the access permissions for node containers of DLC jobs on this tab. This tab also provides a switch for connecting to DSW instances by using SSH and a switch for accessing DSW instances over the Internet. This improves the flexibility and security of user access to instances.

image

FAQ

What do I do if the "The name already exists." error message appears when I create a workspace?

If the system prompts that the workspace name already exists but the workspace is not found in the PAI console workspace list, this may be because a workspace with the same name already exists in the DataWorks console. PAI and DataWorks share workspaces. Therefore, we recommend that you change the name to make sure that the workspace name is unique.

  • On this page (1, T)
  • Prerequisites
  • Limits
  • Account and permission requirements
  • Create a workspace
  • Manage a workspace
  • FAQ
  • What do I do if the "The name already exists." error message appears when I create a workspace?
Feedback