All Products
Search
Document Center

DataWorks:Configure a global YARN queue

Last Updated:Apr 01, 2024

You can configure global YARN queues at the workspace level for DataWorks services. The global YARN queues are used to run E-MapReduce (EMR) tasks by default. You can also specify whether a global YARN queue has a higher priority than a YARN queue that you configure to run a single task in a specified DataWorks service. This topic describes how to configure a global YARN queue.

Background information

YARN is a distributed resource management system. It is the core component of the Hadoop system and is used to manage resources in Hadoop clusters and to schedule and monitor jobs in the clusters. For information about EMR YARN, see YARN schedulers.

In DataWorks, you can use one of the following methods to configure YARN queues that are used to schedule nodes:

  • Method 1: Configure a global YARN queue

    You can configure a global YARN queue that is used by a DataWorks service to run EMR tasks at the workspace level, and specify whether the global YARN queue has a higher priority than a YARN queue that you configure to run a single task in the same DataWorks service. For more information, see the Configure a global YARN queue section in this topic.

  • Method 2: Configure a YARN queue to run a single task in a DataWorks service

    • In DataStudio, you can perform the following steps to configure a YARN queue for an EMR Hive or EMR Spark node: Go to the configuration tab of an EMR Hive or EMR Spark node. In the right-side navigation pane, click Advanced Settings. On the Advanced Settings tab, configure the queue parameter to specify a YARN queue that you want to use to run a task on the EMR Hive or EMR Spark node.

    • In Data Quality, you can configure the Queue parameter to specify a YARN queue when you configure a monitoring rule for partitions of an EMR table. For more information, see Configure a monitoring rule for a single table.

      image.png

    • You cannot specify a YARN queue that you want to use to run a single task in other DataWorks services.

Limits

  • You can use only the following accounts and roles to configure a YARN queue:

    • An Alibaba Cloud account

    • RAM users or RAM roles to which the AliyunDataWorksFullAccess policy is attached

    • RAM users to which the Workspace Administrator role is assigned

  • You need to modify the maximum application priority in your EMR cluster.

    If you want to change the priority of a YARN queue that is used to run EMR tasks in DataWorks, you must add the yarn.cluster.max-application-priority configuration item to the yarn-site.xml file in your EMR cluster and replace the default value 0 with a larger value. If you do not add the configuration item or replace the default value, the priority setting in DataWorks does not take effect on the EMR tasks.

    Note

    After the modification is complete, you must restart the YARN service for the modification to take effect.

  • You can configure global YARN queues only for DataStudio, Data Quality, DataAnalysis, and Operation Center.

Prerequisites

An EMR cluster is registered to DataWorks. For more information, see Register an EMR cluster to DataWorks.

Configure a global YARN queue

  1. Go to the page for configuring global YARN queues.

    1. Go to the Management Center page.

      Log on to the DataWorks console. In the left-side navigation pane, click Management Center. On the page that appears, select the desired workspace from the drop-down list and click Go to Management Center.

    2. In the left-side navigation pane, click Open Source Clusters.

    3. On the Open Source Clusters page, find the desired EMR cluster and click the YARN Resource Queues tab.

      image.png

  2. Configure a global YARN queue.

    Click Edit YARN Resource Queues in the upper-right corner of the YARN Resource Queues tab to configure global YARN queues and queue priorities for DataWorks services.

    Note

    The configurations globally take effect in a workspace. You must confirm the workspace before you configure the parameters.

    Parameter

    Description

    Resource Queue

    The global YARN queue that you want to use to run EMR tasks in a DataWorks service. You can go to the EMR on ECS page in the EMR console to obtain the existing YARN queues.

    Global Settings Take Precedence

    Specifies whether the global YARN queue that you configure for a DataWorks service has a higher priority than the YARN queue that you configure to run a single task in the same DataWorks service. If you select Yes, the global YARN queue is used to run tasks in the DataWorks service in the current workspace.

    • Global configuration: Go to the SettingCenter page. In the left-side navigation pane, click Open Source Clusters. On the Open Source Clusters page, find the desired EMR cluster and click the YARN Resource Queues tab.

      Note

      You can configure global YARN queues only for DataStudio, Data Quality, DataAnalysis, and Operation Center.

    • Separate configurations for single tasks in DataWorks services:

      • In DataStudio, you can perform the following steps to configure a YARN queue for an EMR Hive or EMR Spark node: Go to the configuration tab of an EMR Hive or EMR Spark node. In the right-side navigation pane, click Advanced Settings. On the Advanced Settings tab, configure the queue parameter to specify a YARN queue that you want to use to run a task on the EMR Hive or EMR Spark node.

      • In Data Quality, you can configure the Queue parameter to specify a YARN queue when you configure a monitoring rule for partitions of an EMR table. For more information, see Configure a monitoring rule for a single table.

        image.png

      • You cannot specify a YARN queue that you want to use to run a single task in other DataWorks services.

References

Configure a priority mapping between a baseline and a YARN queue