This topic describes how to use an Alibaba Cloud account to log on to the Alibaba Cloud E-MapReduce (EMR) console, create clusters on the EMR on ACK page, and then run jobs in the console.
Precautions
In this topic, the desired JAR file is packaged into an image. If you are using your own JAR file, you can upload the JAR file to Alibaba Cloud Object Storage Service (OSS). For more information about how to upload a file, see Simple upload.
In this case, you need to replace
local:///opt/spark/examples/spark-examples.jar
in a command with the actual path in which the JAR file is stored in OSS. The path is specified in theoss://<yourBucketName>/<path>.jar
format.
Preparations
Before you can create a cluster on the EMR on ACK page, you must perform the following operations in the Container Service for Kubernetes (ACK) console:
Create an ACK cluster. For more information, see Create an ACK dedicated cluster or Create an ACK managed cluster.
Attach the AliyunOSSFullAccess and AliyunDLFFullAccess policies to the Alibaba Cloud account. For more information, see Attach policies to a RAM role.
If you want to store the JAR package in Alibaba Cloud Object Storage Service (OSS), you must activate OSS first. For more information, see Activate OSS.
Step 1: Assign a role
Before you can use EMR on ACK, your Alibaba Cloud account must be assigned the system default role AliyunEMROnACKDefaultRole. For more information, see Assign a role to an Alibaba Cloud account.
Step 2: Create a cluster
Create a Spark cluster on the EMR on ACK page. For more information, see Create a cluster.
Log on to the EMR console. In the left-side navigation pane, click EMR on ACK.
On the EMR on ACK page, click Create Cluster.
On the E-MapReduce on ACK page, configure the parameters. The following table describes the parameters.
Parameter
Example
Description
Region
China (Hangzhou)
The region in which you want to create a cluster. You cannot change the region after the cluster is created.
Cluster Type
Spark
The type of the cluster. Spark is a common distributed big data processing engine that provides various capabilities, such as extract, transform, and load (ETL), batch processing, and data modeling.
ImportantIf you want to associate a Spark cluster with a Shuffle Service cluster, the major EMR versions of the clusters must be the same. For example, a Spark cluster whose EMR version is EMR-5.x-ack can be associated with only a Shuffle Service cluster whose EMR version is EMR-5.x-ack.
Product Version
EMR-5.6.0-ack
The version of EMR. By default, the latest version is used.
Component Version
SPARK (3.2.1)
Displays the type and version of the component that is deployed in the cluster of the specified type.
ACK Cluster
Emr-ack
Select an existing ACK cluster or create an ACK cluster in the ACK console.
ImportantThe same ACK cluster cannot be associated with multiple clusters of the same type that are created on the EMR on ACK page.
You can click Configure Dedicated Nodes to configure an EMR-dedicated node. You can configure an EMR-dedicated node or node pool by adding taints and labels to the node or node pool.
NoteWe recommend that you configure dedicated nodes in a node pool. If no node pool is available, create a node pool. For more information about how to create a node pool, see Create a node pool. For more information about node pools, see Node pool overview.
OSS Bucket
oss-spark-test
Select an existing bucket or create a bucket in the OSS console.
Cluster Name
Emr-Spark
The name of the cluster. The name must be 1 to 64 characters in length and can contain only letters, digits, hyphens (-), and underscores (_).
Click Create.
If the status of the cluster changes to Running, the cluster is created.
Step 3: Submit a job
After a cluster is created, you can submit jobs. This section describes how to submit a Spark job by using a custom resource definition (CRD). For more information about Spark, see Quick Start. When you view information in Quick Start, select a programming language type and a Spark version.
For information about how to submit different types of jobs, see the following topics:
Connect to an Alibaba Cloud Container Service for Kubernetes (ACK) cluster by using kubectl. For more information, see Obtain the kubeconfig file of a cluster and use kubectl to connect to the cluster.
Create a job file named spark-pi.yaml. The following code shows the content in the file:
apiVersion: "sparkoperator.k8s.io/v1beta2" kind: SparkApplication metadata: name: spark-pi-simple spec: type: Scala sparkVersion: 3.2.1 mainClass: org.apache.spark.examples.SparkPi mainApplicationFile: "local:///opt/spark/examples/spark-examples.jar" arguments: - "1000" driver: cores: 1 coreLimit: 1000m memory: 4g executor: cores: 1 coreLimit: 1000m memory: 8g memoryOverhead: 1g instances: 1
For information about the fields in the code, see spark-on-k8s-operator.
NoteYou can specify a custom file name. In this example, spark-pi.yaml is used.
In this example, Spark 3.2.1 for EMR V5.6.0 is used. If you use another version of Spark, configure the sparkVersion parameter based on your business requirements.
Run the following command to submit a job:
kubectl apply -f spark-pi.yaml --namespace <Namespace in which the cluster resides>
Replace
<Namespace in which the cluster resides>
with the namespace based on your business requirements. To view the namespace, log on to the EMR console and go to the Cluster Details tab.The following information is returned:
sparkapplication.sparkoperator.k8s.io/spark-pi-simple created
Notespark-pi-simple
is the name of the submitted Spark job.Optional. View the information about the submitted Spark job on the Job Details tab.
Step 4: (Optional) Release the cluster
If you no longer require a cluster, you can release the cluster to reduce costs.
On the EMR on ACK page, find the cluster that you want to release and click Release in the Actions column.
In the Release Cluster message, click OK.
References
For information about how to view clusters in the current Alibaba Cloud account, see View cluster information.
For information about how to view jobs in your cluster, see View jobs.