All Products
Search
Document Center

E-MapReduce:Create a workspace

Last Updated:Sep 04, 2024

A workspace is the basic unit of E-MapReduce (EMR) Serverless Spark. You can manage jobs, members, roles, and permissions based on workspaces. Job development must be implemented in a workspace. Therefore, you must create a workspace before you develop jobs. This topic describes how to create a workspace on the EMR Serverless Spark page.

Prerequisites

  • An Alibaba Cloud account is created and real-name verification is complete for the account.

  • The account that you want to use to create a workspace is prepared and the required permissions are granted to the account.

    • If you want to create a workspace by using an Alibaba Cloud account, prepare an Alibaba Cloud account and assign roles to the Alibaba Cloud account. For more information, see Assign roles to an Alibaba Cloud account.

    • If you want to create a workspace as a RAM user, prepare a RAM user and attach the AliyunEMRServerlessSparkFullAccess, AliyunOSSFullAccess, and AliyunDLFFullAccess policies to the RAM user. Then, add the RAM user on the Access Control page and assign the administrator role to the RAM user. For more information, see Grant permissions to a RAM user and Manage users and roles.

  • Data Lake Formation (DLF) is activated. For more information, see Getting Started. For information about the regions in which DLF is supported, see Supported regions and endpoints.

  • Object Storage Service (OSS) is activated and a bucket is created. For more information, see Activate OSS and Create a bucket.

Precautions

The runtime environment of the code is managed and configured by the owner of the environment.

Procedure

  1. Go to the Spark page.

    1. Log on to the EMR console.

    2. In the left-side navigation pane, choose EMR Serverless > Spark.

    3. In the top navigation bar, select a region based on your business requirements.

      Important

      After you create a workspace, you cannot change the region of the workspace.

  2. Click Create Workspace.

  3. On the E-MapReduce Serverless Spark page, configure parameters. The following table describes the parameters.

    Parameter

    Description

    Example

    Region

    The region in which the workspace resides. We recommend that you select the region where your data is stored.

    China (Hangzhou)

    Billing Method

    The billing method of the workspace. Only the pay-as-you-go billing method is supported.

    Pay-as-you-go

    Workspace Name

    The workspace name. The name must be 1 to 60 characters in length, and can contain only letters, digits, and hyphens (-). The name must start with a letter.

    Note

    The names of workspaces within the same Alibaba Cloud account must be unique. If you enter the name of an existing workspace, the system displays a message that prompts you to enter a different name.

    emr-serverless-spark

    DLF for Metadata Storage

    Specifies whether to use DLF to store and manage your metadata.

    By default, if you turn on the switch, the DLF catalog whose name is the same as the UID of your Alibaba Cloud account is selected. If you want different clusters to be associated with different DLF catalogs, you can perform the following operations to create DLF catalogs:

    1. Click Create catalog. In the popover that appears, configure the Catalog ID parameter and click OK.

    2. Select the DLF catalog you created from the drop-down list.

    emr-dlf

    Maximum Quota

    The maximum number of compute units (CUs) that can be concurrently used to process jobs in the workspace.

    100

    Workspace Directory

    The path of the OSS bucket that is used to store data files, such as job logs, running events, and resources.

    We recommend that you select a bucket for which OSS-HDFS is enabled to ensure compatibility with native Hadoop Distributed File System (HDFS) interfaces. If HDFS is not involved in your business scenario, you can select an OSS bucket.

    emr-oss-hdfs

    Advanced Settings

    You can configure the following parameter in the Advanced Settings section:

    Execution Role: The role used by EMR Serverless Spark to run jobs. Select AliyunEMRSparkJobRunDefaultRole.

    EMR Serverless Spark uses this role to access your resources in other cloud services, including the resources in OSS and DLF.

    AliyunEMRSparkJobRunDefaultRole

  4. Click Create Workspace.

References

After you create a workspace, you can develop jobs, such as Spark SQL jobs, in the workspace. For more information, see Get started with SQL jobs.