All Products
Search
Document Center

E-MapReduce:Create a StarRocks cluster

Last Updated:Nov 06, 2024

This topic describes how to create and configure a StarRocks cluster.

Prerequisites

A virtual private cloud (VPC) and a vSwitch are created in the region where you want to create a StarRocks cluster. For more information, see Create and manage a VPC and Create and manage a vSwitch.

Procedure

  1. Go to the cluster creation page.

    1. Log on to the EMR console. In the left-side navigation pane, click EMR on ECS.

    2. Optional. In the top navigation bar, select the region where you want to create a cluster and select a resource group based on your business requirements.

      • The region of a cluster cannot be changed after the cluster is created.

      • All resource groups within your account are displayed by default.

    3. On the EMR on ECS page, click Create Cluster.

  2. Configure the cluster.

    To create a cluster, you must configure software parameters, hardware parameters, and basic parameters as guided by the wizard.

    Important

    After a cluster is created, you cannot modify its parameters except for the cluster name. Make sure that all parameters are correctly configured when you create a cluster.

    1. Configure software parameters.

      Parameter

      Example

      Description

      Region

      China (Hangzhou)

      The region in which you want to create the cluster. You cannot change the region of a cluster after the cluster is created.

      Business Scenario

      Data Analytics

      Select Data Analytics.

      Product Version

      EMR-5.17.0

      The version of EMR. By default, the latest version is selected.

      High Service Availability

      Off

      By default, this switch is turned off. If you turn on this switch, three master nodes are created in the cluster to ensure the availability of the ResourceManager and NameNode processes. You can also modify the number of master nodes.

      Optional Services (Select One At Least)

      Starrocks2

      The other services that you can select based on your business requirements. By default, the relevant processes for the services you specify are started.

      Collect Service Operational Logs

      On

      Specifies whether to enable log collection for all services. By default, this switch is turned on to collect the service operational logs of your cluster. The logs are used only for cluster diagnostics.

      After you create a cluster, you can modify the Collection Status of Service Operational Logs parameter on the Basic Information tab.

      Important

      If you turn off this switch, the EMR cluster health check and service-related technical support are limited. For more information about how to disable log collection and the impacts imposed by disabling of log collection, see How do I stop collection of service operational logs?

      DLF Unified Metadata

      Selected

      By default, the check box is selected. This indicates that metadata is stored in Data Lake Formation (DLF).

      After you activate DLF, the system selects a DLF catalog for you to store metadata. The ID of your account is used by default. If you want different clusters to be associated with different DLF catalogs, you can perform the following operations to create DLF catalogs:

      1. Click Create Catalog. In the popover that appears, enter a catalog ID and click OK.

      2. Select the catalog that you created from the DLF Catalog drop-down list.

      Advanced Settings

      Off

      Custom Software Configuration: customizes software settings. You can use a JSON file to customize the parameters of basic components required for a cluster, such as Hadoop, Spark, and Hive. By default, this switch is turned off.

    2. Configure hardware parameters.

      Parameter

      Example

      Description

      Billing Method

      Pay-as-you-go

      Subscription is selected by default. EMR supports the following billing methods:

      • Pay-as-you-go: a billing method that allows you to pay for a cluster after you use the cluster. The system charges you for a cluster based on the hours the cluster is actually used. Bills are generated on an hourly basis at the top of every hour. We recommend that you use pay-as-you-go clusters for short-term test jobs or dynamically scheduled jobs.

      • Subscription: a billing method that allows you to use a cluster only after you pay for the cluster.

        Note

        We recommend that you create a pay-as-you-go cluster for a test run. If the cluster passes the test, you can create a subscription cluster for production.

      Zone

      Zone I

      The zone where you want to create a cluster. Zones are different geographical areas located in the same region. They are interconnected by an internal network. In most cases, you can use the zone selected by default.

      VPC

      starrocks_test/vpc-bp1f4epmkvncimpgs****

      By default, an existing VPC is selected.

      If you want to use a new VPC, go to the VPC console to create one. For more information, see Create and manage a VPC.

      vSwitch

      vsw_test/vsw-bp1e2f5fhaplp0g6p****

      Select a vSwitch in the specified zone of the VPC. If no vSwitch is available in the zone, go to the VPC console to create a vSwitch in the zone. For more information, see Create and manage a vSwitch.

      Default Security Group

      sg-bp1ddw7sm2risw****/sg-bp1ddw7sm2risw****

      The security group of the cluster. By default, an existing security group is selected. For more information about security groups, see Overview.

      You can also click create a new security group to create a security group in the Elastic Compute Service (ECS) console. For more information, see Create a security group.

      Important

      Do not use an advanced security group that is created in the ECS console.

      Node Group

      Default values

      The node groups of the cluster. You can select instance types based on your business requirements. For more information, see Instance families.

      • Master node group: runs control processes, such as ResourceManager and NameNode.

      • Core node group: stores all the data of a cluster. You can add core nodes based on your business requirements after a cluster is created.

      • Task node group: stores no data and is used to adjust the computing capabilities of clusters. No task node group is configured by default. You can configure a task node group based on your business requirements.

      • Add to Deployment Set: If you turn on the High Service Availability switch, the master nodes are added to a deployment set by default. A deployment set is used to control the distribution of ECS instances. For more information, see Deployment set.

      • System Disk: You can select a standard SSD, enhanced SSD, or ultra disk based on your business requirements. You can adjust the size of the system disk based on your business requirements.

      • Data Disk: You can select standard SSDs, enhanced SSDs, or ultra disks based on your business requirements. You can adjust the size of the data disks based on your business requirements.

        Note

        If you select enhanced SSDs, you can specify different performance levels (PLs) for the enhanced SSDs based on the disk capacity to meet different cluster performance requirements. The default performance level is PL1. When you configure the system disk, you can select an enhanced SSD of the following performance levels: PL0, PL1, and PL2. When you configure data disks, you can select enhanced SSDs of the following performance levels: PL0, PL1, PL2, and PL3. For more information, see Disks.

      • Instances: One master node is configured by default. If you turn on the High Service Availability switch, multiple master nodes can be configured.

        Two core nodes are configured in the core node group by default. You can change the number of core nodes based on your business requirements.

      • Additional Security Group: An additional security group allows interactions between different external resources and applications. You can associate a node group with up to two additional security groups.

      • Assign Public Network IP: specifies whether to associate an EIP address with the cluster. This switch is turned off by default. You can assign public IP addresses only to the node groups of DataLake clusters.

        Note

        If you do not turn on this switch but want to access the cluster over the Internet after you create the cluster, you must apply for a public IP address on ECS. For information about how to apply for an EIP address, see Elastic IP addresses.

    3. Configure basic parameters.

      Configure parameters in the Basic Configuration step.

      Parameter

      Example

      Description

      Cluster Name

      Emr-StarRocks

      The name of the cluster. The name must be 1 to 64 characters in length and can contain only letters, digits, hyphens (-), and underscores (_).

      Identity Credentials

      Password

      Key Pair: the SSH key pairs that are used to log on to a Linux instance. This value is selected by default.

      For information about how to use a key pair, see Overview.

      Password: the password that is used to log on to the master node (Linux instance).

      The password must be 8 to 30 characters in length and must contain uppercase letters, lowercase letters, digits, and special characters.

      The following special characters are supported: ! @ # $ % ^ & *

      (Optional) Advanced Settings

      Parameter

      Description

      ECS Application Role

      You can assign an ECS application role to a cluster. EMR applies for a temporary AccessKey pair when applications running on the compute nodes of the cluster access other Alibaba Cloud services, such as OSS. This way, you do not need to manually enter an AccessKey pair. You can grant the access permissions of the application role on specific Alibaba Cloud services based on your business requirements.

      Bootstrap Actions

      You can configure bootstrap actions to run custom scripts before a cluster starts. You can use bootstrap actions to install third-party software and modify the runtime environment of your clusters. For more information, see Manage bootstrap actions.

      Tag

      You can add a tag when you create a cluster or add a tag on the Basic Information tab after a cluster is created. Tags help you identify and manage cluster resources. For more information, see Manage and use tags.

      Resource Group

      You can group your resources based on usage, permissions, and ownership. For more information, see Use resource groups.

      Data Disk Encryption

      You can turn on this switch only when you create a cluster. If you turn on this switch, both data in transit and data at rest on the disk are encrypted. For more information, see Enable data disk encryption.

      Remarks

      Remarks are used to records important information about an EMR cluster. You can modify the remarks on the Basic Information tab after the cluster is created. If you do not configure the Remarks parameter when you create a cluster, you can add remarks after the cluster is created.

  3. In the Confirm step, read the terms of service and select the check box.

  4. Optional. If a key pair is used for identity authentication, you can click Save as Cluster Template to save the configurations of the current cluster as a cluster template.

    1. In the Save as Cluster Template dialog box, configure the Cluster Template Name and Cluster Template Resource Group parameters.

      Parameter

      Description

      Cluster Template Name

      Enter a cluster template name to facilitate template management. The name must be 1 to 64 characters in length and can contain only letters, digits, hyphens (-), and underscores (_).

      Cluster Template Resource Group

      Select an existing resource group based on your business requirements to manage cluster templates by group.

      If you want to use a new resource group, click Create Resource Group to create one. For more information, see Create a resource group.

    2. Click OK.

      A cluster template is created in the Manage Cluster Templates panel. For more information about cluster templates, see Create a cluster template.

  5. Click Confirm.

    Refresh the page to view the creation progress. When Status becomes Running, the cluster is created.

FAQ

Q: How are the frontend (FE) and backend (BE) processes of StarRocks deployed on the master and core nodes of a cluster?

A: The FE process of StarRocks is deployed on the master node. One master node is configured by default. If you turn on High Service Availability when you create a cluster, three master nodes are configured by default. Each master node is configured with an FE process. The High Service Availability feature provides fault tolerance and load balancing capabilities.

By default, a BE process of StarRocks is deployed on each core node. You can adjust the number of core nodes with BE processes deployed.