Use kubectl to manage jobs - E-MapReduce - Alibaba Cloud Documentation Center

You can manage jobs in the Alibaba Cloud E-MapReduce (EMR) console or by using kubectl or APIs. This topic describes how to use kubectl to manage Spark jobs.

Prerequisites

A Spark cluster is created on the EMR on ACK page of the new EMR console. For more information, see Getting started.

Procedure

Connect to an Alibaba Cloud Container Service for Kubernetes (ACK) cluster by using kubectl. For more information, see Connect to ACK clusters by using kubectl.
You can also connect to the ACK cluster by calling an API operation. For more information, see Use the Kubernetes API.

Run the following commands to manage a job.

View the job status. Syntax:

kubectl describe SparkApplication <Job name> --namespace <Namespace in which the cluster resides>

The following information is returned:

Name:         spark-pi-simple
Namespace:    c-48e779e0d9ad****
Labels:       <none>
Annotations:  <none>
API Version:  sparkoperator.k8s.io/v1beta2
Kind:         SparkApplication
Metadata:
  Creation Timestamp:  2021-07-22T06:25:33Z
  Generation:          1
  Resource Version:  7503740
  UID:               930874ad-bb17-47f1-a556-55118c1d****
Spec:
  Arguments:
    1000
  Driver:
    Core Limit:  1000m
    Cores:       1
    Memory:      4g
  Executor:
    Core Limit:           1000m
    Cores:                1
    Instances:            1
    Memory:               8g
    Memory Overhead:      1g
  Image:                  registry-vpc.cn-hangzhou.aliyuncs.com/emr/spark:emr-2.4.5-1.0.0
  Main Application File:  local:///opt/spark/examples/target/scala-2.11/jars/spark-examples_2.11-2.4.5.jar
  Main Class:             org.apache.spark.examples.SparkPi
  Spark Version:          2.4.5
  Type:                   Scala
Status:
  Application State:
    State:  RUNNING
  Driver Info:
    Pod Name:                spark-pi-simple-driver
    Web UI Address:          172.16.230.240:4040
    Web UI Ingress Address:  spark-pi-simple.c-48e779e0d9ad4bfd.c7f6b768c34764c27ab740bdb1fc2a3ff.cn-hangzhou.alicontainer.com
    Web UI Ingress Name:     spark-pi-simple-ui-ingress
    Web UI Port:             4040
    Web UI Service Name:     spark-pi-simple-ui-svc
  Execution Attempts:        1
  Executor State:
    spark-pi-1626935142670-exec-1:  RUNNING
  Last Submission Attempt Time:     2021-07-22T06:25:33Z
  Spark Application Id:             spark-15b44f956ecc40b1ae59a27ca18d****
  Submission Attempts:              1
  Submission ID:                    d71f30e2-9bf8-4da1-8412-b585fd45****
  Termination Time:                 <nil>
Events:
  Type    Reason                     Age   From            Message
  ----    ------                     ----  ----            -------
  Normal  SparkApplicationAdded      17s   spark-operator  SparkApplication spark-pi-simple was added, enqueuing it for submission
  Normal  SparkApplicationSubmitted  14s   spark-operator  SparkApplication spark-pi-simple was submitted successfully
  Normal  SparkDriverRunning         13s   spark-operator  Driver spark-pi-simple-driver is running
  Normal  SparkExecutorPending       7s    spark-operator  Executor spark-pi-1626935142670-exec-1 is pending
  Normal  SparkExecutorRunning       6s    spark-operator  Executor spark-pi-1626935142670-exec-1 is running

Replace <Namespace in which the cluster resides> with the namespace based on your business requirements. To view the namespace, log on to the EMR console and go to the Cluster Details tab.

To obtain the job name, log on to the EMR console and go to the Cluster Details tab of the cluster details page. The created jobs are displayed in the Jobs section.

Terminate and delete a job. Syntax:

kubectl delete SparkApplication <Job name> -n <Namespace in which the cluster resides>

The following information is returned:

sparkapplication.sparkoperator.k8s.io "spark-pi-simple" deleted

View the logs of a job. Syntax:

kubectl logs <Job name-driver> -n <Namespace in which the cluster resides>

Note If the job name is spark-pi-simple and the namespace is c-d2232227b95145d3, the command you run is kubectl logs spark-pi-simple-driver -n c-d2232227b95145d3.

Information similar to the following output is returned:

......
Pi is roughly 3.141488791414888
21/07/22 14:37:57 INFO SparkContext: Successfully stopped SparkContext
21/07/22 14:37:57 INFO ShutdownHookManager: Shutdown hook called
21/07/22 14:37:57 INFO ShutdownHookManager: Deleting directory /var/data/spark-b6a43b55-a354-44d7-ae5e-45b8b1493edb/spark-56aae0d1-37b9-4a7d-9c99-4e4ca12deb4b
21/07/22 14:37:57 INFO ShutdownHookManager: Deleting directory /tmp/spark-e2500491-6ed7-48d7-b94e-a9ebeb899320