You can configure global YARN queues at the workspace level for DataWorks services. The global YARN queues are used to run E-MapReduce (EMR) tasks by default. You can also specify whether a global YARN queue has a higher priority than a YARN queue that you configure to run a single task in a specified DataWorks service. This topic describes how to configure a global YARN queue.
Background information
YARN is a distributed resource management system. It is the core component of the Hadoop system and is used to manage resources in Hadoop clusters and to schedule and monitor jobs in the clusters. For information about EMR YARN, see YARN schedulers.
In DataWorks, you can use one of the following methods to configure YARN queues that are used to schedule nodes:
Method 1: Configure a global YARN queue
You can configure a global YARN queue that is used by a DataWorks service to run EMR tasks at the workspace level, and specify whether the global YARN queue has a higher priority than a YARN queue that you configure to run a single task in the same DataWorks service. For more information, see the Configure a global YARN queue section in this topic.
Method 2: Configure a YARN queue to run a single task in a DataWorks service
In DataStudio, you can perform the following steps to configure a YARN queue for an EMR Hive or EMR Spark node: Go to the configuration tab of an EMR Hive or EMR Spark node. In the right-side navigation pane, click Advanced Settings. On the Advanced Settings tab, configure the queue parameter to specify a YARN queue that you want to use to run a task on the EMR Hive or EMR Spark node.
In Data Quality, you can configure the Queue parameter to specify a YARN queue when you configure a monitoring rule for partitions of an EMR table. For more information, see Configure a monitoring rule for a single table.
You cannot specify a YARN queue that you want to use to run a single task in other DataWorks services.
Limits
You can use only the following accounts and roles to configure a YARN queue:
An Alibaba Cloud account
RAM users or RAM roles to which the AliyunDataWorksFullAccess policy is attached
RAM users to which the Workspace Administrator role is assigned
You need to modify the maximum application priority in your EMR cluster.
If you want to change the priority of a YARN queue that is used to run EMR tasks in DataWorks, you must add the
yarn.cluster.max-application-priority
configuration item to theyarn-site.xml
file in your EMR cluster and replace the default value0
with a larger value. If you do not add the configuration item or replace the default value, the priority setting in DataWorks does not take effect on the EMR tasks.NoteAfter the modification is complete, you must restart the YARN service for the modification to take effect.
You can configure global YARN queues only for DataStudio, Data Quality, DataAnalysis, and Operation Center.
Prerequisites
An EMR cluster is registered to DataWorks. For more information, see Register an EMR cluster to DataWorks.
Configure a global YARN queue
Go to the page for configuring global YARN queues.
Go to the Management Center page.
Log on to the DataWorks console. In the left-side navigation pane, click Management Center. On the page that appears, select the desired workspace from the drop-down list and click Go to Management Center.
In the left-side navigation pane, click Open Source Clusters.
On the Open Source Clusters page, find the desired EMR cluster and click the YARN Resource Queues tab.
Configure a global YARN queue.
Click Edit YARN Resource Queues in the upper-right corner of the YARN Resource Queues tab to configure global YARN queues and queue priorities for DataWorks services.
NoteThe configurations globally take effect in a workspace. You must confirm the workspace before you configure the parameters.
Parameter
Description
Resource Queue
The global YARN queue that you want to use to run EMR tasks in a DataWorks service. You can go to the EMR on ECS page in the EMR console to obtain the existing YARN queues.
Global Settings Take Precedence
Specifies whether the global YARN queue that you configure for a DataWorks service has a higher priority than the YARN queue that you configure to run a single task in the same DataWorks service. If you select Yes, the global YARN queue is used to run tasks in the DataWorks service in the current workspace.
Global configuration: Go to the SettingCenter page. In the left-side navigation pane, click Open Source Clusters. On the Open Source Clusters page, find the desired EMR cluster and click the YARN Resource Queues tab.
NoteYou can configure global YARN queues only for DataStudio, Data Quality, DataAnalysis, and Operation Center.
Separate configurations for single tasks in DataWorks services:
In DataStudio, you can perform the following steps to configure a YARN queue for an EMR Hive or EMR Spark node: Go to the configuration tab of an EMR Hive or EMR Spark node. In the right-side navigation pane, click Advanced Settings. On the Advanced Settings tab, configure the queue parameter to specify a YARN queue that you want to use to run a task on the EMR Hive or EMR Spark node.
In Data Quality, you can configure the Queue parameter to specify a YARN queue when you configure a monitoring rule for partitions of an EMR table. For more information, see Configure a monitoring rule for a single table.
You cannot specify a YARN queue that you want to use to run a single task in other DataWorks services.
References
Configure a priority mapping between a baseline and a YARN queue