This topic describes how to create and configure a Dataflow Kafka cluster, which refers to a Dataflow cluster that is deployed with the Kafka service.
Usage notes
When you create a Dataflow Kafka cluster, you must select the appropriate type of Elastic Compute Service (ECS) instance and determine the number of brokers based on the estimated load of your business. No general cluster plan can be provided due to the variety of business scenarios. You need to create a cluster based on your actual environment. In most cases, we recommend that you consider the following items when you select an instance type:
- Deploy Kafka brokers on ECS instances whose CPU-to-memory ratio is 1:4.
- Use cloud disks to store data.
- Consider the relationship between the I/O throughput of cloud disks and the network interface controller (NIC) bandwidth.
Consider the following factors when you configure the deployment parameters:
- The Kafka versions used in E-MapReduce (EMR) depend on the ZooKeeper service. The availability of ZooKeeper determines whether the Kafka service is highly available. Therefore, we recommend that you turn on High Service Availability when you create a cluster. If you turn on High Service Availability when you create the cluster, three nodes are deployed for the ZooKeeper service.
- If the master node group is only used to deploy ZooKeeper, you need to configure only one data disk for the master node group.
For more information about evaluation-based suggestions, see Suggestions for estimating cluster resources.
Procedure
- Go to the cluster creation page.
- Configure the cluster. To create a cluster, you must configure software parameters, hardware parameters, and basic parameters as guided by the wizard.Important After a cluster is created, you cannot modify its parameters except for the cluster name. Make sure that all parameters are correctly configured when you create a cluster.
- In the Confirm step, read the terms of service and select the check box.
- Click Confirm. Refresh the EMR on ECS page to view the creation progress. When Status becomes Running, the cluster is created.
What to do next
After the cluster is created, you can modify the values of the default parameters of the cluster to meet production requirements. Examples:- Specify whether to enable the SSL encryption feature for an EMR Kafka cluster. For more information, see Use SSL to encrypt Kafka data links.
- Specify whether to enable the Simple Authentication and Security Layer (SASL) feature to perform logon authentication for an EMR Kafka cluster. For more information, see Log on to a Kafka cluster by using SASL.