Alibaba Cloud E-MapReduce (EMR) allows you to build and run open source big data frameworks such as Hadoop, Spark, Hive, and Presto for large-scale data processing and analysis. This topic describes how to create an EMR cluster on the EMR on ECS page in the EMR console.
If this is the first time you create an EMR cluster after 17:00 (UTC+8) on December 19, 2022, you cannot create a Hadoop, Data Science, Presto, or ZooKeeper cluster.
Prerequisites
RAM authorization is complete. For more information, see Assign roles to an Alibaba Cloud account.
Precautions
When you create a DataLake cluster in the new data lake scenario, a Dataflow cluster, a DataServing cluster, or a custom cluster of EMR V5.12.1, EMR V3.46.1, or a minor version later than EMR V5.12.1 or EMR V3.46.1, if the services that you select do not depend on nodes in a newly added task node group, you can click Remove Node Group in the Actions column of the task node group in the Node Group section.
Procedure
Log on to the EMR console. In the left-side navigation pane, click EMR on ECS.
In the top navigation bar, select the region where you want to create a cluster and select a resource group based on your business requirements.
The region of a cluster cannot be changed after the cluster is created.
By default, all resource groups in your account are displayed.
On the EMR on ECS page, click Create Cluster.
Configure the cluster as prompted.
When you create a cluster, you need to configure the software, hardware, and basic information, and confirm the order for the cluster.
NoteAfter a cluster is created, you cannot modify its parameters except for the cluster name. Make sure that all parameters are correctly configured when you create a cluster.
After you verify that all configurations are correct, read the terms of service and select the check box.
Click Confirm.
ImportantPay-as-you-go clusters: The cluster is created immediately. After the cluster is created, the cluster is in the Running state.
Subscription clusters: An order is generated. The cluster will be created after you complete the payment.
Parameter description
Software parameters
Parameter | Description |
Region | The geographic location where the Elastic Compute Service (ECS) instances of the cluster are located. To ensure minimal network latency, select a region that is close to your geographical location. After the cluster is created, you cannot change the region. Select a region from the drop-down list. |
Business Scenario | Select a business scenario based on your business requirements. Valid values:
|
Product Version | The version of EMR. For more information, see Overview. |
High Service Availability | By default, this switch is turned off. If you turn on the switch, multiple master nodes are created in the cluster to ensure the high availability of the ResourceManager and NameNode processes. In addition, EMR distributes the master nodes across different underlying hardware devices to reduce the risk of failures. |
Optional Services (Select One At Least) | The services that you can select for the cluster. You can select services based on your business requirements. The processes related to the services that you select are automatically started. Important
|
Collect Service Operational Logs | Specifies whether to enable log collection for all services. By default, this switch is turned on to collect the service operational logs of your cluster. The logs are used only for cluster diagnostics. After you create a cluster, you can modify the Collection Status of Service Operational Logs parameter on the Basic Information tab. Important If you turn off this switch, the EMR cluster health check and service-related technical support are limited. For more information about how to disable log collection and the impacts imposed by disabling of log collection, see How do I stop collection of service operational logs? |
Metadata | The method for storing and managing metadata. Valid values:
|
Root Storage Directory of Cluster | The root storage directory of cluster data. This parameter is required only if you select the OSS-HDFS service. Important If you click Create OSS-HDFS Bucket to create a bucket, you can read data from or write data to the bucket only in the EMR console. You cannot perform operations on the bucket in the OSS console or by using a specified API. The first time you use OSS-HDFS, you must complete authorization as prompted. If you use a RAM user, you must attach the AliyunEMRDlsFullAccess policy and assign the AliyunOSSDlsDefaultRole and AliyunEMRDlsDefaultRole roles to the RAM user by using your Alibaba Cloud account. For more information, see Grant permissions to RAM users. Select a bucket for which OSS-HDFS is enabled in the same region, or click Create OSS-HDFS Bucket to create an OSS-HDFS bucket as the root storage path of the cluster. Note
|
More
Hardware parameters
Parameter | Description |
Billing Method | The billing method of the cluster. Subscription is selected by default. EMR supports the following billing methods:
|
Zone | The zone where you want to create a cluster. A zone in a region is a physical area with independent power supplies and network facilities. Clusters in zones within the same region can communicate with each other over an internal network. In most cases, you can use the zone that is selected by default. |
VPC | The virtual private cloud (VPC) where you want to deploy the cluster. A VPC is a logically isolated network on which you have full control. You can select an existing VPC or click create a VPC to create a VPC in the VPC console. For more information, see Create and manage a VPC. Note The internal IP address of the cluster is associated with the VPC. Therefore, you cannot modify the internal IP address after the cluster is created. |
vSwitch | The vSwitch of the cluster. vSwitch is a basic component of VPCs. vSwitches can be used to establish network communication between cloud resources. You can select an existing vSwitch or click Create vSwitch to create a vSwitch in the VPC console. For more information, see Create and manage a vSwitch. |
Default Security Group | The security group of the cluster. A security group is a virtual firewall that is used to control the inbound and outbound traffic of instances in the security group. For more information, see Overview. You can select an existing security group or click create a new security group to create a security group in the ECS console. For more information, see Create a security group. Important Do not use an advanced security group that is created in the ECS console. |
Node Group | The node groups of the cluster. You can select instance types based on your business requirements. For more information, see Instance families.
|
Basic parameters
Parameter | Description |
Cluster Name | The name of the cluster. The name must be 1 to 64 characters in length and can contain only letters, digits, hyphens (-), and underscores (_). |
Identity Credentials | The credentials that are used to log on to the master node of the cluster. For more information, see Log on to a cluster. Valid values:
|
Order confirmation
Optional. If a key pair is used for identity authentication, you can click Save as Cluster Template to save the configurations of the current cluster as a cluster template.
In the Save as Cluster Template dialog box, configure the Cluster Template Name and Cluster Template Resource Group parameters.
Parameter
Description
Cluster Template Name
Enter a cluster template name to facilitate template management. The name must be 1 to 64 characters in length and can contain only letters, digits, hyphens (-), and underscores (_).
Cluster Template Resource Group
Select an existing resource group based on your business requirements to manage cluster templates by group.
If you want to use a new resource group, click Create Resource Group to create one. For more information, see Create a resource group.
Click OK.
A cluster template is created in the Manage Cluster Templates panel. For more information about cluster templates, see Create a cluster template.
References
For information about cluster-related issues, see FAQ about cluster management.
For information about how to add services to an existing cluster, see Add services.
For information about how to log on to a cluster, see Log on to a cluster.
For information about how to select an instance type, see ECS instances.
For information about component-related issues, see FAQ.
For information about how to create a cluster by calling an API operation, see CreateCluster.