Alibaba Cloud E-MapReduce (EMR) helps you build and run open source big data frameworks, such as Hadoop, Spark, Hive, and Presto, for large-scale data processing and analysis. This topic describes how to create an EMR on ECS cluster and explains the required configurations to help you quickly set up and manage your big data cluster.
If you create an EMR cluster for the first time after 17:00 (UTC+8) on December 19, 2022, you cannot select the Hadoop, Data Science, Presto, or Zookeeper cluster types.
Prerequisites
The RAM authorization is complete. For more information, see Alibaba Cloud account role authorization.
Precautions
For DataLake, DataFlow, DataServing, and Custom clusters of EMR 5.12.1 and later or EMR 3.46.1 and later, if the selected services do not depend on core nodes, you can click Remove Node Group in the Node Group section.
Procedure
Log on to the E-MapReduce console.
In the top navigation bar, select a region and a resource group as needed.
Region: The cluster is created in the selected region. The region cannot be changed after the cluster is created.
Resource Group: By default, all resources in your account are displayed.
Click Create Cluster.
Configure the cluster as prompted.
When you create a cluster, you must configure software, hardware, and basic settings, and then confirm the order.
NoteAfter a cluster is created, you cannot change its configurations, except for the cluster name. Carefully confirm all configurations before you create the cluster.
After you confirm that all information is correct, click Confirm.
ImportantPay-as-you-go clusters: The cluster creation process starts immediately. After the cluster is created, its status changes to Running.
Subscription clusters: An order is generated. The cluster is created after you complete the payment.
Configuration details
Software configuration
Configuration | Description |
Region | A region is a geographic area where a data center is located. Select a region close to you to reduce network latency. The region cannot be changed after the instance is created. From the Region drop-down list, select the physical location for the EMR instance. |
Business Scenario | Select a scenario based on your actual needs:
|
Product Version | The release version of the EMR product. For more information, see Release versions. |
High Service Availability | This feature is disabled by default. If you enable high availability, EMR creates multiple master nodes to support high availability for ResourceManager and NameNode. EMR distributes these nodes across different underlying hardware to reduce the risk of failure. |
Optional Services (Select One At Least) | Select other services as needed. The related service processes for the selected services will start by default. Important
|
Collect Service Operational Logs | You can enable or disable log collection for all services with one click. This feature is enabled by default to collect your service operational logs. These logs are used only for cluster diagnostics. After the cluster is created, change the Collection Status of Service Operational Logs on the Basic Information page. Important If you disable log collection, EMR health checks and technical support are limited, but other features can still be used normally. For more information about how to disable this feature and its effects, see How do I stop collecting service logs?. |
Metadata | The following methods are supported for storing and managing metadata:
|
Root Storage Directory of Cluster | Configure this parameter when you select the OSS-HDFS service in the optional services section. This parameter is not required if you select the HDFS service. Important Buckets created by clicking Create OSS-HDFS Bucket in the EMR console can be read from and written to only through EMR. Operations in the console or through an API are not supported. The first time you use the OSS-HDFS service, the Alibaba Cloud account must click here and follow the prompts to complete the authorization. For a Resource Access Management (RAM) user, the Alibaba Cloud account must grant authorization to activate the service and grant the AliyunEMRDlsFullAccess permission, and the AliyunOSSDlsDefaultRole and AliyunEMRDlsDefaultRole roles. For more information, see Grant permissions to a RAM user. Select a bucket for which the OSS-HDFS service is activated in the same region, or click Create OSS-HDFS Bucket and follow the prompts to create an OSS-HDFS instance as the cluster's root storage path. Note
|
More scenarios
Hardware configuration
Configuration | Description |
Billing Method | The default billing method is subscription. The following billing methods are supported:
|
Zone | A zone is a distinct physical area within the same region. Zones within the same region can communicate with each other over the internal network. You can usually use the default zone. |
VPC | A virtual private cloud (VPC) is an isolated network environment that you define in Alibaba Cloud. You have full control over your VPC. Select an existing VPC, or click Create VPC to go to the VPC console and create a VPC. For more information, see Create and manage a VPC. Note You cannot change the private IP address after the cluster is created because the cluster's private IP is bound to the VPC. |
vSwitch | A vSwitch is a basic network module of a VPC that connects different cloud resources. Select an existing vSwitch, or click . Create vSwitch to go to the VPC console and create a vSwitch. For more information, see Create and manage a vSwitch. |
Default Security Group | A security group is a virtual firewall that controls the inbound and outbound traffic of instances within the security group. For more information, see Security group overview. Select an existing security group, or click create a new security group. to go to the ECS console and create a new security group. For more information, see Create a security group. Important Do not use advanced security groups created on ECS. |
Node Group | Select an instance type as needed. For more information, see Instance families.
|
Cluster Scaling | Select a scaling rule as needed:
Note
|
Basic configuration
Configuration Item | Description |
Cluster Name | The name of the cluster. The name must be 1 to 64 characters in length and can contain Chinese characters, letters, digits, hyphens (-), and underscores (_). |
Identity Credentials | The Identity Credentials are used to securely log on to the cluster's master node. For logon operations, see Log on to a cluster. The following identities are supported:
|
Confirm order
(Optional) Save as Cluster Template: If you select Key Pair for identity authentication, you can click Save as Cluster Template to save the current cluster configuration as a template.
In the Save as Cluster Template dialog box, enter a Cluster Template Name and select a Cluster Template Resource Group.
Parameter
Description
Cluster Template Name
Enter a name for the cluster template to facilitate later management. The name must be 1 to 64 characters in length and can contain only Chinese characters, letters, digits, hyphens (-), and underscores (_).
Cluster Template Resource Group
Select an existing resource group as needed to manage templates by group.
To create a new resource group, click Create Resource Group.. For more information, see Create a resource group.
Click OK.
A new cluster template is added to the Manage Cluster Templates panel. For more information about cluster templates, see Create a cluster template.
FAQ
Related documents
For FAQs about creating clusters, see FAQ about cluster management.
To add services after a cluster is created, see Add a service.
For information about how to log on to a cluster, see Log on to a cluster.
For information about how to select an instance type, see ECS instance types.
For FAQs about using various components, see FAQ.
For information about using the API, see CreateCluster.