Run Spark jobs on ARM-based virtual nodes - Container Service for Kubernetes

By default, E-MapReduce (EMR) on Container Service for Kubernetes (ACK) runs Spark jobs on nodes that use the x86 architecture. You can also run Spark jobs on ARM-based virtual nodes (elastic container instances). This topic describes how to run Spark jobs on ARM-based virtual nodes.

Prerequisites
Introduction to EMR and EMR on ACK
Procedure

Prerequisites

A Spark cluster is created on the EMR on ACK page of the EMR console. For more information, see Getting started.
Elastic Container Instance is activated. For more information, see Use elastic container instances.

Introduction to EMR and EMR on ACK

EMR is an open source big data processing solution provided by Alibaba Cloud. For more information, see What is E-MapReduce?
EMR on ACK provides a new platform that you can use to develop and run big data jobs. You can deploy open source big data services in ACK clusters. You can use ACK to deploy services and manage containerized applications. This reduces the O&M costs of underlying cluster resources and helps you focus on big data jobs. For more information, see Overview of EMR on ACK.

Procedure

Add virtual nodes to an ACK cluster. For more information, see Method 2: Add ARM-based virtual nodes.

Submit Spark jobs in an EMR on ACK cluster. For more information, see Submit a Spark job.

Method 1: Submit a Spark job by using a CRD

When you submit a job by using a Custom Resource Definition (CRD), configure the following parameters:

image: In this example, registry-vpc.cn-hangzhou.aliyuncs.com/emr/spark-py:emr-3.3.1-1.1.7-arm is used. Replace cn-hangzhou with the ID of the region that you use.
annotations: Set the value to alibabacloud.com/burst-resource: "eci_only".
nodeSelector: Set the value to kubernetes.io/arch: arm64.

Sample code:

Click to view details

apiVersion: "sparkoperator.k8s.io/v1beta2"
kind: SparkApplication
metadata:
  name: spark-pi-eci
spec:
  type: Scala
  sparkVersion: 3.3.1
  mainClass: org.apache.spark.examples.SparkPi
  # The specific image that is used by ARM-based nodes. 
  image: registry-vpc.cn-hangzhou.aliyuncs.com/emr/spark-py:emr-3.3.1-1.1.7-arm
  mainApplicationFile: "local:///opt/spark/examples/spark-examples.jar"
  arguments:
    - "100000"
  driver:
    cores: 2
    coreLimit: 2000m
    memory: 4g
    # Configure the annotations parameter to allow all executors to run Spark jobs by using Elastic Container Instance. 
    annotations:
      alibabacloud.com/burst-resource: "eci_only"
		# Configure the nodeSelector parameter to specify ARM-based nodes. 
    nodeSelector:
      kubernetes.io/arch: arm64
  executor:
    cores: 4
    coreLimit: 4000m
    memory: 8g
    instances: 10
    # Configure the annotations parameter to allow all executors to run Spark jobs by using Elastic Container Instance. 
    annotations:
      alibabacloud.com/burst-resource: "eci_only"
		# Configure the nodeSelector parameter to specify ARM-based nodes. 
    nodeSelector:
      kubernetes.io/arch: arm64

Method 2: Run Spark jobs based on a Spark configuration file

You can configure the image, annotations, and node selector in a Spark configuration file to run Spark jobs on ARM-based nodes.

Go to the spark-defaults.conf tab.
1. Log on to the EMR on ACK console, find the cluster that you want to manage, and then click Configure in the Actions column.
2. On the Configure tab, click the spark-defaults.conf tab.

Enable Elastic Container Instance for the Spark cluster.

On the spark-defaults.conf tab, click Add Configuration Item.

In the Add Configuration Item dialog box, add the configuration items that are described in the following table.

Key	Description	Example
spark.kubernetes.container.image	The Spark image.	registry-vpc.cn-hangzhou.aliyuncs.com/emr/spark-py:emr-3.3.1-1.1.7-arm Note Replace cn-hangzhou with the ID of the region that you use.
spark.kubernetes.driver.annotation.alibabacloud.com/burst-resource	Specifies whether the Spark driver uses Elastic Container Instance to run Spark jobs.	eci_only
spark.kubernetes.driver.node.selector.kubernetes.io/arch	The node selector of the Spark driver.	arm64
spark.kubernetes.executor.annotation.alibabacloud.com/burst-resource	Specifies whether the Spark executor uses Elastic Container Instance to run Spark jobs.	eci_only
spark.kubernetes.executor.node.selector.kubernetes.io/arch	The node selector of the Spark executor.	arm64

Click OK.
In the dialog box that appears, configure the Execution Reason parameter and click Save.

Deploy the configurations.
1. In the lower part of the Configure tab, click Deploy Client Configuration.
2. In the dialog box that appears, configure the Execution Reason parameter and click OK.
3. In the Confirm message, click OK.

Container Service for Kubernetes:Run Spark jobs on ARM-based virtual nodes

Table of contents

Prerequisites

Introduction to EMR and EMR on ACK

Procedure

Method 1: Submit a Spark job by using a CRD

Method 2: Run Spark jobs based on a Spark configuration file