This topic describes the terms of E-MapReduce (EMR) Serverless Spark to help you better understand the service.
Term | Description |
workspace | The basic unit for business development. A workspace contains jobs, computing resources, and permissions that are isolated from those in other workspaces. |
resource queue | EMR Serverless Spark uses compute unit (CU) as the basic unit to measure computing resources. Regardless of whether a Spark compute node is a driver or an executor, you can allocate one CU or multiple CUs to the node based on the vCore and memory configuration. EMR Serverless Spark provides a minimum of 20 GiB and a maximum of 160 GiB of local storage space for each compute node. The CU consumed by a job depends on the computation complexity of the job and the distribution of the dependent data. You can view the number of CUs consumed by a job run in the job run list. |
compute | A computing resource that is available in an EMR Serverless Spark workspace. A compute can be associated with a queue and provides the infrastructure that is required to run SQL statements and the Notebook environment. If a compute is not used by jobs within 45 minutes, the system automatically terminates the compute to release the resources. You can change the engine version and queue with which a compute is associated on the Compute page. You can also modify Spark parameters based on your business requirements. |
draft file | A draft file is a job that is being developed on the job development page of EMR Serverless Spark. It is not complete or needs further modification. |
publish | The action to publish a draft file. To prevent draft files under modification from affecting the scheduling of jobs, publish a draft file only if the file does not need further modification. The publishing of draft files help isolate the development and production environments. |
job run | In the job orchestration system, a job run ID is generated every time when a workflow runs. |
workflow | An orderly process that consists of a series of jobs that depend on one another and are run in a specific order. |
user | A term that is used in access control. You can add a Resource Access Management (RAM) user as a member of a workspace and then grant the RAM user the required permissions to manage jobs and resources in the workspace. |
role | A term that is used in access control. One user can assume multiple roles. Multiple users can assume the same role. After you grant permissions to a role, all users who assume this role have the same permissions. |