This topic describes how to create and configure a StarRocks cluster.
Prerequisites
A virtual private cloud (VPC) and a vSwitch are created in the region where you want to create a StarRocks cluster. For more information, see Create and manage a VPC and Create and manage a vSwitch.
Procedure
Go to the cluster creation page.
Log on to the EMR console. In the left-side navigation pane, click EMR on ECS.
Optional. In the top navigation bar, select the region where you want to create a cluster and select a resource group based on your business requirements.
The region of a cluster cannot be changed after the cluster is created.
All resource groups within your account are displayed by default.
On the EMR on ECS page, click Create Cluster.
Configure the cluster.
To create a cluster, you must configure software parameters, hardware parameters, and basic parameters as guided by the wizard.
ImportantAfter a cluster is created, you cannot modify its parameters except for the cluster name. Make sure that all parameters are correctly configured when you create a cluster.
Configure software parameters.
Parameter
Example
Description
Region
China (Hangzhou)
The region in which you want to create the cluster. You cannot change the region of a cluster after the cluster is created.
Business Scenario
Data Analytics
Select Data Analytics.
Product Version
EMR-5.17.0
The version of EMR. By default, the latest version is selected.
High Service Availability
Off
By default, this switch is turned off. If you turn on this switch, three master nodes are created in the cluster to ensure the availability of the ResourceManager and NameNode processes. You can also modify the number of master nodes.
Optional Services (Select One At Least)
Starrocks2
The other services that you can select based on your business requirements. By default, the relevant processes for the services you specify are started.
Collect Service Operational Logs
On
Specifies whether to enable log collection for all services. By default, this switch is turned on to collect the service operational logs of your cluster. The logs are used only for cluster diagnostics.
After you create a cluster, you can modify the Collection Status of Service Operational Logs parameter on the Basic Information tab.
ImportantIf you turn off this switch, the EMR cluster health check and service-related technical support are limited. For more information about how to disable log collection and the impacts imposed by disabling of log collection, see How do I stop collection of service operational logs?
DLF Unified Metadata
Selected
By default, the check box is selected. This indicates that metadata is stored in Data Lake Formation (DLF).
After you activate DLF, the system selects a DLF catalog for you to store metadata. The ID of your account is used by default. If you want different clusters to be associated with different DLF catalogs, you can perform the following operations to create DLF catalogs:
Click Create Catalog. In the popover that appears, enter a catalog ID and click OK.
Select the catalog that you created from the DLF Catalog drop-down list.
Advanced Settings
Off
Custom Software Configuration: customizes software settings. You can use a JSON file to customize the parameters of basic components required for a cluster, such as Hadoop, Spark, and Hive. By default, this switch is turned off.
Configure hardware parameters.
Parameter
Example
Description
Billing Method
Pay-as-you-go
Subscription is selected by default. EMR supports the following billing methods:
Pay-as-you-go: a billing method that allows you to pay for a cluster after you use the cluster. The system charges you for a cluster based on the hours the cluster is actually used. Bills are generated on an hourly basis at the top of every hour. We recommend that you use pay-as-you-go clusters for short-term test jobs or dynamically scheduled jobs.
Subscription: a billing method that allows you to use a cluster only after you pay for the cluster.
NoteWe recommend that you create a pay-as-you-go cluster for a test run. If the cluster passes the test, you can create a subscription cluster for production.
Zone
Zone I
The zone where you want to create a cluster. Zones are different geographical areas located in the same region. They are interconnected by an internal network. In most cases, you can use the zone selected by default.
VPC
starrocks_test/vpc-bp1f4epmkvncimpgs****
By default, an existing VPC is selected.
If you want to use a new VPC, go to the VPC console to create one. For more information, see Create and manage a VPC.
vSwitch
vsw_test/vsw-bp1e2f5fhaplp0g6p****
Select a vSwitch in the specified zone of the VPC. If no vSwitch is available in the zone, go to the VPC console to create a vSwitch in the zone. For more information, see Create and manage a vSwitch.
Default Security Group
sg-bp1ddw7sm2risw****/sg-bp1ddw7sm2risw****
The security group of the cluster. By default, an existing security group is selected. For more information about security groups, see Overview.
You can also click create a new security group to create a security group in the Elastic Compute Service (ECS) console. For more information, see Create a security group.
ImportantDo not use an advanced security group that is created in the ECS console.
Node Group
Default values
The node groups of the cluster. You can select instance types based on your business requirements. For more information, see Instance families.
Master node group: runs control processes, such as ResourceManager and NameNode.
Core node group: stores all the data of a cluster. You can add core nodes based on your business requirements after a cluster is created.
Task node group: stores no data and is used to adjust the computing capabilities of clusters. No task node group is configured by default. You can configure a task node group based on your business requirements.
Add to Deployment Set: If you turn on the High Service Availability switch, the master nodes are added to a deployment set by default. A deployment set is used to control the distribution of ECS instances. For more information, see Deployment set.
System Disk: You can select a standard SSD, enhanced SSD, or ultra disk based on your business requirements. You can adjust the size of the system disk based on your business requirements.
Data Disk: You can select standard SSDs, enhanced SSDs, or ultra disks based on your business requirements. You can adjust the size of the data disks based on your business requirements.
NoteIf you select enhanced SSDs, you can specify different performance levels (PLs) for the enhanced SSDs based on the disk capacity to meet different cluster performance requirements. The default performance level is PL1. When you configure the system disk, you can select an enhanced SSD of the following performance levels: PL0, PL1, and PL2. When you configure data disks, you can select enhanced SSDs of the following performance levels: PL0, PL1, PL2, and PL3. For more information, see Disks.
Instances: One master node is configured by default. If you turn on the High Service Availability switch, multiple master nodes can be configured.
Two core nodes are configured in the core node group by default. You can change the number of core nodes based on your business requirements.
Additional Security Group: An additional security group allows interactions between different external resources and applications. You can associate a node group with up to two additional security groups.
Assign Public Network IP: specifies whether to associate an EIP address with the cluster. This switch is turned off by default. You can assign public IP addresses only to the node groups of DataLake clusters.
NoteIf you do not turn on this switch but want to access the cluster over the Internet after you create the cluster, you must apply for a public IP address on ECS. For information about how to apply for an EIP address, see Elastic IP addresses.
Configure basic parameters.
Configure parameters in the Basic Configuration step.
Parameter
Example
Description
Cluster Name
Emr-StarRocks
The name of the cluster. The name must be 1 to 64 characters in length and can contain only letters, digits, hyphens (-), and underscores (_).
Identity Credentials
Password
Key Pair: the SSH key pairs that are used to log on to a Linux instance. This value is selected by default.
For information about how to use a key pair, see Overview.
Password: the password that is used to log on to the master node (Linux instance).
The password must be 8 to 30 characters in length and must contain uppercase letters, lowercase letters, digits, and special characters.
The following special characters are supported: ! @ # $ % ^ & *
In the Confirm step, read the terms of service and select the check box.
Optional. If a key pair is used for identity authentication, you can click Save as Cluster Template to save the configurations of the current cluster as a cluster template.
In the Save as Cluster Template dialog box, configure the Cluster Template Name and Cluster Template Resource Group parameters.
Parameter
Description
Cluster Template Name
Enter a cluster template name to facilitate template management. The name must be 1 to 64 characters in length and can contain only letters, digits, hyphens (-), and underscores (_).
Cluster Template Resource Group
Select an existing resource group based on your business requirements to manage cluster templates by group.
If you want to use a new resource group, click Create Resource Group to create one. For more information, see Create a resource group.
Click OK.
A cluster template is created in the Manage Cluster Templates panel. For more information about cluster templates, see Create a cluster template.
Click Confirm.
Refresh the page to view the creation progress. When Status becomes Running, the cluster is created.
FAQ
Q: How are the frontend (FE) and backend (BE) processes of StarRocks deployed on the master and core nodes of a cluster?
A: The FE process of StarRocks is deployed on the master node. One master node is configured by default. If you turn on High Service Availability when you create a cluster, three master nodes are configured by default. Each master node is configured with an FE process. The High Service Availability feature provides fault tolerance and load balancing capabilities.
By default, a BE process of StarRocks is deployed on each core node. You can adjust the number of core nodes with BE processes deployed.