All Products
Search
Document Center

E-MapReduce:Manage workspaces

Last Updated:Sep 04, 2024

A workspace is the basic unit in which E-MapReduce (EMR) Serverless Spark manages jobs and members, assigns roles to members, and grants permissions to members. You can perform all configurations and run jobs and workflows in a specific workspace. The administrator of a workspace can add users to the workspace as members and assign specific roles, such as the workspace administrator, data analyst, data developer, and guest role, to the members. This way, workspace members to which different roles are assigned can collaborate with each other. This topic describes the basic operations that you can perform on a workspace.

Prerequisites

  • An Alibaba Cloud account is created and real-name verification is complete for the account.

  • The account that you want to use to create a workspace is prepared and the required permissions are granted to the account.

    • If you want to create a workspace by using an Alibaba Cloud account, prepare an Alibaba Cloud account and assign roles to the Alibaba Cloud account. For more information, see Assign roles to an Alibaba Cloud account.

    • If you want to create a workspace as a RAM user, prepare a RAM user and attach the AliyunEMRServerlessSparkFullAccess, AliyunOSSFullAccess, and AliyunDLFFullAccess policies to the RAM user. Then, add the RAM user on the Access Control page and assign the administrator role to the RAM user. For more information, see Grant permissions to a RAM user and Manage users and roles.

  • Data Lake Formation (DLF) is activated. For more information, see Getting Started. For information about the regions in which DLF is supported, see Supported regions and endpoints.

  • Object Storage Service (OSS) is activated and a bucket is created. For more information, see Activate OSS and Create a bucket.

Create a workspace

  1. Go to the Spark page.

    1. Log on to the EMR console.

    2. In the left-side navigation pane, choose EMR Serverless > Spark.

  2. On the Spark page, click Create Workspace.

  3. On the E-MapReduce Serverless Spark page, configure parameters. The following table describes the parameters.

    Parameter

    Description

    Region

    The region in which the workspace resides. We recommend that you select the region where your data resides. You cannot change the region after the workspace is created.

    Billing Method

    The billing method of the workspace. Only the pay-as-you-go billing method is supported.

    Workspace Name

    The workspace name. The name must be 1 to 60 characters in length, and can contain only letters, digits, and hyphens (-). The name must start with a letter.

    Note
    • The names of workspaces within the same Alibaba Cloud account must be unique. If you enter the name of an existing workspace, the system displays a message that prompts you to enter a different name.

    • After a workspace is created, you cannot modify the name of the workspace.

    DLF for Metadata Storage

    Specifies whether to use Data Lake Formation (DLF) to store and manage your metadata. Select the ID of the catalog with which you want to associate the workspace. You can also perform the following operations to create a data catalog:

    1. Click Create Catalog. In the popover that appears, configure the Catalog ID parameter and click OK.

    2. Select the DLF catalog you created from the drop-down list.

    Note

    After you create a workspace, you can associate an existing DLF data catalog with the workspace. For more information, see Data catalog.

    Maximum Quota

    The maximum number of compute units (CUs) that can be concurrently used to process jobs in the workspace.

    Workspace Directory

    The path of the Object Storage Service (OSS) bucket that is used to store data files, such as job logs, running events, and resources. To view quasi-real-time incremental logs during O&M, we recommend that you use a bucket that has OSS-HDFS enabled.

    Advanced Settings

    You can configure the following parameter in the Advanced Settings section:

    Execution Role: The role used by EMR Serverless Spark to run jobs. Select AliyunEMRSparkJobRunDefaultRole.

  4. Click Create Workspace.

Delete a workspace

Important
  • Before you delete a workspace, make sure that no running jobs exist in the workspace. If a job is running in the workspace, the system prompts you to stop the job before you delete the workspace.

  • After a workspace is deleted, the resources in the workspace, including jobs and data, are released and cannot be restored. To avoid data loss, you must back up the job scripts before you delete the workspace.

  • After a workspace is released, data such as OSS or OSS-HDFS logs associated with the workspace is retained.

  1. On the Spark page, find the desired workspace and click Delete in the Actions column.

  2. In the dialog box that appears, enter the name of the workspace and click OK.

References

  • If you want to add more RAM users to a workspace and assign different roles to the RAM users to enable collaborative development, you can go to the Permissions page to perform the operations. For more information, see Manage users and roles.

  • If you want to isolate and manage resources, you can add queues. For more information, see Manage resource queues.