All Products
Search
Document Center

E-MapReduce:Manage node groups

Last Updated:Sep 18, 2024

This topic describes how to create, modify, and delete a node group for DataLake, Dataflow, online analytical processing (OLAP), DataServing, and custom clusters.

Background information

Node groups are key resources used to manage the nodes in an E-MapReduce (EMR) cluster. In most cases, node groups consist of Elastic Compute Service (ECS) instances of the same instance type. You can use node groups to manage nodes in batches. You can specify the instance type of each node group based on your business requirements. For example, you can create a node group that consists of memory optimized instances to handle offline big data jobs. You can also create a node group that consists of compute optimized instances to run model training jobs. The ratio of vCores to memory of a memory optimized instance is 1 to 8 and the ratio of vCores to memory of a compute optimized instance is 1 to 2.

For information about how to manage node groups for Hadoop, Data Science, and EMR Studio clusters, see Manage node groups (Hadoop, Data Science, and EMR Studio clusters).

Limits

  • This topic applies only to DataLake, Dataflow, OLAP, DataServing, and custom clusters.

  • Task node groups whose billing method is Pay-as-you-go or Preemptible Instance do not support the configuration upgrade operation.

    For information about a configuration upgrade, see Upgrade node configurations.

Create a node group

  1. Go to the Nodes tab.

    1. Log on to the EMR console. In the left-side navigation pane, click EMR on ECS.

    2. In the top navigation bar, select the region in which your cluster resides and select a resource group based on your business requirements.

    3. On the EMR on ECS page, find the cluster that you want to scale out and click Nodes in the Actions column.

  2. On the Nodes tab, click Add Node Group.

  3. In the Add Node Group panel, configure the parameters. The following table describes the parameters.

    Parameter

    Description

    Node Group Type

    The type of node group that you can create. Valid values:

    • CORE (Core Node Group)

    • TASK (Task Node Group)

    • GATEWAY (Task Submission Group) (available only for DataLake and Dataflow clusters of EMR V5.10.1 or later)

    • MASTER-EXTEND (Load Expansion Group) (available only for high-availability clusters of EMR V3.51.1 or a later minor version and of EMR V5.17.1 or a later minor version)

      If the load on the master node of a cluster is high, you can add Master-Extend node groups to deploy service components on different node groups. This helps reduce the load on the master node.

      Note

      After you add a service to the cluster, the service components are not deployed on the Master-Extend node group by default. If you want the components of a service to be deployed on the Master-Extend node group, you can select the components that you want to deploy based on your business requirements when you add a Master-Extend node group.

    Billing Method

    The billing method of the node group. Supported billing methods are Pay-as-you-go, Preemptible Instance, and Subscription.

    Note

    Only task node groups support the Preemptible Instance billing method.

    Node Group Name

    The name of the node group. The name of a node group must be unique.

    Components

    You can select the components of a service that you want to deploy only if you select MASTER-EXTEND (Load Expansion Group) for the Node Group Type parameter.

    You can deploy the components of the following services:

    • Hive: HiveMetaStore and HiveServer

    • Kyuubi: KyuubiServer

    • Spark: SparkHistoryServer and SparkThriftServer

    Assign Public Network IP

    Specifies whether to enable Internet access for the node group. After you turn on the switch, all nodes in the node group are connected to the Internet.

    vSwitch

    Select a vSwitch in the current virtual private cloud (VPC). You cannot change the vSwitch after the node group is created.

    Note

    You must select a vSwitch in a VPC that is deployed in the same zone as the cluster.

    Additional Security Group

    Optional. Associate the node group with additional security groups.

    You can associate up to four additional security groups with this node group.

    Instance Type

    The instance type for the node group. You can select an instance type based on your business requirements.

    • If the billing method of the node group is Subscription, you can select only one instance type.

    • If the billing method is Pay-as-you-go or Preemptible Instance and the node group consists of task nodes, you can select up to 10 instance types based on the same ratio of vCores to memory.

    Storage Configuration

    • System Disk: Select an enhanced SSD (ESSD) or an ultra disk based on your business requirements. Valid values: 60 to 500. Unit: GiB. We recommend that you set the size to at least 120 GiB.

    • Data Disk: Select ESSDs or ultra disks based on your business requirements. Valid values: 40 to 32768. Unit: GiB. We recommend that you set the size to at least 80 GiB.

    Note

    If you select enhanced SSDs, you can specify different performance levels (PLs) for the enhanced SSDs based on the disk capacity to meet different cluster performance requirements. The default performance level is PL1. When you configure the system disk, you can select an enhanced SSD of the following performance levels: PL0, PL1, and PL2. When you configure data disks, you can select enhanced SSDs of the following performance levels: PL0, PL1, PL2, and PL3. For more information, see Disks.

    Scaling Policy

    Note

    This parameter is available only when Billing Method is set to Preemptible Instance.

    • Priority-based Policy (default)

      The system attempts to use the specified instance types sequentially to create a node until the node is successfully created. The actual instance types used to create nodes are subject to inventory availability.

    • Cost Optimization Policy

      When a scale-out activity is triggered, Auto Scaling preferentially creates ECS instances that have the lowest vCore price. When a scale-in activity is triggered, Auto Scaling preferentially removes ECS instances that have the highest vCore price. If you select Preemptible Instance as the billing method in the scaling configuration, Auto Scaling preferentially creates preemptible instances. If preemptible instances cannot be created due to insufficient resources, Auto Scaling creates pay-as-you-go instances.

      For more information, see Cost optimization policy.

    Graceful Shutdown

    Note

    This parameter is available only for clusters that have YARN deployed.

    After you enable graceful shutdown, the system must wait for the jobs on the nodes to complete or time out before the system can scale in the nodes. You can configure the yarn.resourcemanager.nodemanager-graceful-decommission-timeout-secs parameter on the Yarn service page to modify the graceful shutdown timeout period.

  4. Click OK.

    After a node group is created, you can find the node group on the Nodes tab.

Modify a node group

  1. On the Nodes tab, find the node group that you want to modify and click the name of the node group in the Node Group Name / ID column.

  2. In the Node Group Attributes panel, modify the parameters that are relevant to the node group.

Delete a node group

Important

You can delete a task or core node group when the Status column of the node group displays Running and the Number of Nodes column displays 0.

  1. On the Nodes tab, find the node group that you want to delete, move the pointer over the More icon in the Actions column, and then select Delete Node Group.

  2. In the message that appears, click Delete.

Cost optimization policy

You can develop a detailed cost optimization policy to achieve a balance between cost and stability.成本优化模式

Parameter

Description

Minimum Pay-As-You-Go Nodes in Auto Scaling Group

The minimum number of pay-as-you-go instances required by the auto scaling group. If the number of pay-as-you-go instances in the auto scaling group drops below this value, pay-as-you-go instances are preferentially created.

Percentage of Pay-As-You-Go Nodes

The proportion of pay-as-you-go instances in the auto scaling group after the number of existing pay-as-you-go instances reaches the value of Minimum Pay-As-You-Go Nodes in Auto Scaling Group.

Lowest-Cost Instance Types

The number of instance types that have the lowest prices. If preemptible instances are required, the system evenly creates preemptible instances based on the instance types that have the lowest prices. The maximum value is 3.

Replace Preemptible Instances

Specifies whether to enable preemptible instance replacement. If this switch is turned on, the system automatically replaces an existing preemptible instance with a new preemptible instance about five minutes before the existing instance is reclaimed.

If you do not specify the Minimum Pay-As-You-Go Nodes, Percentage of Pay-As-You-Go Nodes, or Lowest-Cost Instance Types parameter, the machine group is a general cost optimization scaling group. If you specify the parameters, the machine group is a mixed-instance cost optimization scaling group. The two types of cost optimization scaling groups are fully compatible with each other in terms of interfaces and features.

You can use a mixed-instance cost optimization scaling group to achieve the same effect as a specific general cost optimization scaling group by configuring appropriate mixed-instance policies. Examples:
  • In a general cost optimization scaling group, only pay-as-you-go instances are created.

    In your mixed-instance cost optimization scaling group, set Minimum Pay-As-You-Go Nodes to 0, Percentage of Pay-As-You-Go Nodes to 100, and Lowest-Cost Instance Types to 1.

  • In a general cost optimization scaling group, preemptible instances are preferentially created.

    In your mixed-instance cost optimization scaling group, set Minimum Pay-As-You-Go Nodes to 0, Percentage of Pay-As-You-Go Nodes to 0, and Lowest-Cost Instance Types to 1.

References