All Products
Search
Document Center

:Configure a priority mapping between a baseline and a YARN queue

Last Updated:Nov 13, 2024

In DataWorks, you can adjust the final YARN queue priority of an E-MapReduce (EMR) node based on the priority mapping between the baseline to which the EMR node belongs and a YARN queue. This topic describes how to configure a priority mapping between a baseline and a YARN queue.

Background information

YARN is a distributed resource management system that is used to manage and schedule resources in an EMR cluster and allocate resources for various types of jobs that are run on YARN. In YARN, queue priorities determine the jobs to which YARN preferentially allocates resources. The jobs with a high priority can be preferentially scheduled to run. For more information about YARN, see Overview.

In DataWorks, you can use one of the following methods to configure YARN queues that are used to schedule nodes:

If the Global Settings Take Precedence feature is enabled for the workspace to which a desired EMR node belongs, the YARN queue that is configured at the workspace level is preferentially used to schedule the EMR node. If the Global Settings Take Precedence feature is not enabled, the YARN queue that is configured for the EMR node is used.

The final priorities of YARN queues that are used to schedule EMR nodes are determined based on the following principles:

  • If a priority mapping between a baseline to which an EMR node belongs and a YARN queue is configured, the final priority of the YARN queue that is used to schedule the node is determined based on the priority mapping.

  • If no priority mapping is configured between a baseline to which an EMR node belongs and a YARN queue, the priority of the YARN queue that is configured for the EMR node is used.

Prerequisites

An EMR node is created, and the priority of the YARN queue that is configured for the EMR node is specified. For more information, see Create an EMR Hive node, Create an EMR Spark node, or Create an EMR Spark SQL node.

Limits

Take note of the following limits when you configure a priority mapping between a baseline and a YARN queue:

  • Feature

    • This feature is available only for EMR Hive nodes, EMR Spark nodes, and EMR Spark SQL nodes.

    • You must configure the highest priority of a YARN queue in the EMR cluster before you can configure a priority mapping between a baseline and the YARN queue.

    • You must log on to the EMR console to modify the YARN queue priority. After you modify the YARN queue priority, you must restart the YARN service. The modified priority takes effect only for a specified YARN queue.

    Note

    For information about how to configure the priority of a YARN queue, see Configure an EMR DataLake cluster.

  • Permissions

    • Only a tenant administrator can configure a priority mapping. If you want to configure a priority mapping by using a member account, the member account must be assigned the tenant administrator role. For more information, see Manage permissions on tenant members.

    • This feature is available at the tenant level. The configured mapping relationship takes effect within the tenant.

    • Only users to which the AliyunDataWorksAccessingEMRReadOnlyPolicy policy is attached can configure a priority mapping. For more information, see Grant permissions to a RAM user.

  • Resource groups

    Exclusive resource groups for scheduling that were purchased before August 31, 2023 do not support this feature. If an exclusive resource group for scheduling that you use to run an EMR node was purchased before this date, contact the technical personnel of DataWorks to upgrade the resource group to make the feature available. If the resource group is not upgraded, the configured mapping will not take effect. In this case, the priority of the YARN queue that is configured for the EMR node is used.

Entry point for configuring a priority mapping

  1. Go to the Operation Center page.

    Log on to the DataWorks console. In the top navigation bar, select the desired region. In the left-side navigation pane, choose Data Development and Governance > Operation Center. On the page that appears, select the desired workspace from the drop-down list and click Go to Operation Center.

  2. In the left-side navigation pane, choose Alarm > Smart Baseline.

Logic for configuring a priority mapping

You can configure a priority mapping on the Baseline Priority Mapping tab on the Smart Baseline page.

Select a cluster and a YARN queue that are used to run an EMR node based on your business requirements, and configure a priority mapping between the baseline to which the EMR node belongs and the YARN queue. Configuration logic:

Note

You must log on to the EMR console and obtain the YARN queue information on the Services tab of the desired cluster.

  • The configured YARN queue priority cannot exceed the highest YARN queue priority in the EMR cluster.

  • A larger number indicates a higher priority for a YARN queue. Resources are preferentially allocated to schedule the node that is run in the YARN queue with a higher priority.

  • The YARN queue priority that is mapped to a low baseline priority cannot be higher than a high baseline priority.

For more information about YARN configurations, see Overview.