All Products
Search
Document Center

E-MapReduce:Terms

Last Updated:Jul 15, 2024

This topic describes the terms of E-MapReduce (EMR) Serverless Spark to help you better understand the service.

Term

Description

workspace

The basic unit for business development. A workspace contains jobs, computing resources, and permissions that are isolated from those in other workspaces.

resource queue

EMR Serverless Spark uses compute unit (CU) as the basic unit to measure computing resources. 1 CU = 1 CPU core + 4 GiB of memory + Local storage.

Regardless of whether a Spark compute node is a driver or an executor, you can allocate one CU or multiple CUs to the node based on the vCore and memory configuration. EMR Serverless Spark provides a minimum of 20 GiB and a maximum of 160 GiB of local storage space for each compute node. The CU consumed by a job depends on the computation complexity of the job and the distribution of the dependent data. You can view the number of CUs consumed by a job run in the job run list.

compute

A computing resource that is available in an EMR Serverless Spark workspace. A compute can be associated with a queue and provides the infrastructure that is required to run SQL statements and the Notebook environment.

If a compute is not used by jobs within 45 minutes, the system automatically terminates the compute to release the resources. You can change the engine version and queue with which a compute is associated on the Compute page. You can also modify Spark parameters based on your business requirements.

draft file

A draft file is a job that is being developed on the job development page of EMR Serverless Spark. It is not complete or needs further modification.

publish

The action to publish a draft file. To prevent draft files under modification from affecting the scheduling of jobs, publish a draft file only if the file does not need further modification. The publishing of draft files help isolate the development and production environments.

job run

In the job orchestration system, a job run ID is generated every time when a workflow runs.

workflow

An orderly process that consists of a series of jobs that depend on one another and are run in a specific order.

user

A term that is used in access control. You can add a Resource Access Management (RAM) user as a member of a workspace and then grant the RAM user the required permissions to manage jobs and resources in the workspace.

role

A term that is used in access control. One user can assume multiple roles. Multiple users can assume the same role. After you grant permissions to a role, all users who assume this role have the same permissions.