In DataWorks, you can adjust the final YARN queue priority of an E-MapReduce (EMR) node based on the priority mapping between the baseline to which the EMR node belongs and a YARN queue. This topic describes how to configure a priority mapping between a baseline and a YARN queue.
Background information
YARN is a distributed resource management system that is used to manage and schedule resources in an EMR cluster and allocate resources for various types of jobs that are run on YARN. In YARN, queue priorities determine the jobs to which YARN preferentially allocates resources. The jobs with a high priority can be preferentially scheduled to run. For more information about YARN, see Overview.
In DataWorks, you can use one of the following methods to configure YARN queues that are used to schedule nodes:
Method 1: Configure a global YARN queue. You can configure a YARN queue that is used by a DataWorks service to run EMR nodes on the Workspace page at the workspace level. For more information, see Subsequent operation: Specify YARN queues.
Method 2: Configure a YARN queue for a single node. You can configure a YARN queue for a single EMR node and configure the priority of the YARN queue on the configuration tab of the EMR node. For more information, see Create an EMR Hive node, Create an EMR Spark node, or Create an EMR Spark SQL node.
If the Global Settings Take Precedence feature is enabled for the workspace to which a desired EMR node belongs, the YARN queue that is configured at the workspace level is preferentially used to schedule the EMR node. If the Global Settings Take Precedence feature is not enabled, the YARN queue that is configured for the EMR node is used.
The final priorities of YARN queues that are used to schedule EMR nodes are determined based on the following principles:
If a priority mapping between a baseline to which an EMR node belongs and a YARN queue is configured, the final priority of the YARN queue that is used to schedule the node is determined based on the priority mapping.
If no priority mapping is configured between a baseline to which an EMR node belongs and a YARN queue, the priority of the YARN queue that is configured for the EMR node is used.
Prerequisites
An EMR node is created, and the priority of the YARN queue that is configured for the EMR node is specified. For more information, see Create an EMR Hive node, Create an EMR Spark node, or Create an EMR Spark SQL node.
Limits
Take note of the following limits when you configure a priority mapping between a baseline and a YARN queue:
Feature
This feature is available only for EMR Hive nodes, EMR Spark nodes, and EMR Spark SQL nodes.
You must configure the highest priority of a YARN queue in the EMR cluster before you can configure a priority mapping between a baseline and the YARN queue.
You must log on to the EMR console to modify the YARN queue priority. After you modify the YARN queue priority, you must restart the YARN service. The modified priority takes effect only for a specified YARN queue.
NoteFor information about how to configure the priority of a YARN queue, see Configure an EMR DataLake cluster.
Permissions
Only a tenant administrator can configure a priority mapping. If you want to configure a priority mapping by using a member account, the member account must be assigned the tenant administrator role. For more information, see Manage permissions on tenant members.
This feature is available at the tenant level. The configured mapping relationship takes effect within the tenant.
Only users to which the AliyunDataWorksAccessingEMRReadOnlyPolicy policy is attached can configure a priority mapping. For more information, see Grant permissions to a RAM user.
Resource groups
Exclusive resource groups for scheduling that were purchased before August 31, 2023 do not support this feature. If an exclusive resource group for scheduling that you use to run an EMR node was purchased before this date, contact the technical personnel of DataWorks to upgrade the resource group to make the feature available. If the resource group is not upgraded, the configured mapping will not take effect. In this case, the priority of the YARN queue that is configured for the EMR node is used.
Entry point for configuring a priority mapping
Go to the Operation Center page.
Log on to the DataWorks console. In the top navigation bar, select the desired region. In the left-side navigation pane, choose . On the page that appears, select the desired workspace from the drop-down list and click Go to Operation Center.
In the left-side navigation pane, choose
.
Logic for configuring a priority mapping
You can configure a priority mapping on the Baseline Priority Mapping tab on the Smart Baseline page.
Select a cluster and a YARN queue that are used to run an EMR node based on your business requirements, and configure a priority mapping between the baseline to which the EMR node belongs and the YARN queue. Configuration logic:
You must log on to the EMR console and obtain the YARN queue information on the Services tab of the desired cluster.
The configured YARN queue priority cannot exceed the highest YARN queue priority in the EMR cluster.
A larger number indicates a higher priority for a YARN queue. Resources are preferentially allocated to schedule the node that is run in the YARN queue with a higher priority.
The YARN queue priority that is mapped to a low baseline priority cannot be higher than a high baseline priority.
For more information about YARN configurations, see Overview.