All Products
Search
Document Center

E-MapReduce:Scale out an EMR cluster

Last Updated:Jul 08, 2024

You can add core nodes or task nodes to scale out an E-MapReduce (EMR) cluster that has insufficient computing or storage resources.

Prerequisites

An EMR cluster is created. For more information, see Create a cluster.

Limits

  • You cannot scale out the master node group. You can add only core nodes and task nodes to an existing EMR cluster. By default, the configurations of an added node are the same as those of the existing nodes in the same node group.

  • For Hadoop clusters, you cannot scale out a node group that is created on the Auto Scaling tab. For more information, see Configure auto scaling (only for Hadoop clusters).

Precautions

If your cluster contains StarRocks, and StarRocks is manually upgraded, subsequent scale-out operations may cause version inconsistency of your cluster. To ensure that the system runs as expected, we strongly recommend that you migrate data and tasks of the cluster that contains StarRocks to EMR Serverless StarRocks.

EMR Serverless StarRocks is seamlessly compatible with open source StarRocks and can be automatically upgraded. This helps you manage StarRocks in a convenient manner and prevents the risks brought by manual upgrade of StarRocks.

Procedure

Important

The cluster scale-out operation does not restart the application processes on existing nodes.

  1. Go to the Nodes tab.

    1. Log on to the EMR console. In the left-side navigation pane, click EMR on ECS.

    2. In the top navigation bar, select the region in which your cluster resides and select a resource group based on your business requirements.

    3. On the EMR on ECS page, find the cluster that you want to scale out and click Nodes in the Actions column.

  2. On the Nodes tab, find the desired node group and click Scale Out in the Actions column.

  3. In the Scale Out dialog box, configure the parameters based on your business requirements.

    Parameter

    Description

    Node Group Name

    The name of the node group.

    Node Type

    The type of the node group.

    Current Instance Type

    The information about instances in the node group.

    Billing Method

    The billing method of the cluster. The billing method of a new node is the same as that of the cluster and cannot be changed.

    If the billing method is Subscription, you can determine whether to turn on Auto-renewal. If you turn on the switch, you can configure the Subscription Duration parameter for the new node.

    Note

    After you turn on Auto-renewal, the subscription nodes that are added will be automatically renewed seven days before the expiration date. The default renewal duration is one month. You can change the renewal duration or disable the auto-renewal feature on the Auto-renewal page

    vSwitch

    The information about the vSwitch that is deployed for the node group.

    Current Quantity

    The number of instances in the node group.

    Added Instances

    The number of instances that you want to add to the node group. Click the upward or downward arrow or enter a number in the Added Instances field.

    Cluster Expiration Time

    The expiration time of the subscription cluster.

    Terms of Service

    Read and agree to the terms of service.

  4. Click OK.

    For information about how to log on to a new node, see Log on to a cluster.

What to do next

After you scale out a core node group of an EMR cluster that uses Hadoop Distributed File System (HDFS) to store data, data distribution in HDFS may be unbalanced. In this case, you can use HDFS Balancer to redistribute data that is stored on DataNodes. For more information, see HDFS Balancer.

References

  • If the vCPUs or memory of Elastic Compute Service (ECS) instances in a node group cannot meet your business requirements, you can upgrade the instance configurations of the node group. For more information, see Upgrade node configurations.

  • For information about how to expand data disks for EMR clusters, see Expand a disk.

  • For information about FAQ related to cluster scale-out, see FAQ about cluster management.

  • For information about how to scale out a cluster by calling an API operation, see IncreaseNodes.