You can manage jobs in the Alibaba Cloud E-MapReduce (EMR) console or by using kubectl or APIs. This topic describes how to use kubectl to manage Spark jobs.

Prerequisites

A Spark cluster is created on the EMR on ACK page of the new EMR console. For more information, see Getting started.

Procedure

  1. Connect to an Alibaba Cloud Container Service for Kubernetes (ACK) cluster by using kubectl. For more information, see Connect to ACK clusters by using kubectl.
    You can also connect to the ACK cluster by calling an API operation. For more information, see Use the Kubernetes API.
  2. Run the following commands to manage a job.
    • View the job status. Syntax:
      kubectl describe SparkApplication <Job name> --namespace <Namespace in which the cluster resides>
      The following information is returned:
      Name:         spark-pi-simple
      Namespace:    c-48e779e0d9ad****
      Labels:       <none>
      Annotations:  <none>
      API Version:  sparkoperator.k8s.io/v1beta2
      Kind:         SparkApplication
      Metadata:
        Creation Timestamp:  2021-07-22T06:25:33Z
        Generation:          1
        Resource Version:  7503740
        UID:               930874ad-bb17-47f1-a556-55118c1d****
      Spec:
        Arguments:
          1000
        Driver:
          Core Limit:  1000m
          Cores:       1
          Memory:      4g
        Executor:
          Core Limit:           1000m
          Cores:                1
          Instances:            1
          Memory:               8g
          Memory Overhead:      1g
        Image:                  registry-vpc.cn-hangzhou.aliyuncs.com/emr/spark:emr-2.4.5-1.0.0
        Main Application File:  local:///opt/spark/examples/target/scala-2.11/jars/spark-examples_2.11-2.4.5.jar
        Main Class:             org.apache.spark.examples.SparkPi
        Spark Version:          2.4.5
        Type:                   Scala
      Status:
        Application State:
          State:  RUNNING
        Driver Info:
          Pod Name:                spark-pi-simple-driver
          Web UI Address:          172.16.230.240:4040
          Web UI Ingress Address:  spark-pi-simple.c-48e779e0d9ad4bfd.c7f6b768c34764c27ab740bdb1fc2a3ff.cn-hangzhou.alicontainer.com
          Web UI Ingress Name:     spark-pi-simple-ui-ingress
          Web UI Port:             4040
          Web UI Service Name:     spark-pi-simple-ui-svc
        Execution Attempts:        1
        Executor State:
          spark-pi-1626935142670-exec-1:  RUNNING
        Last Submission Attempt Time:     2021-07-22T06:25:33Z
        Spark Application Id:             spark-15b44f956ecc40b1ae59a27ca18d****
        Submission Attempts:              1
        Submission ID:                    d71f30e2-9bf8-4da1-8412-b585fd45****
        Termination Time:                 <nil>
      Events:
        Type    Reason                     Age   From            Message
        ----    ------                     ----  ----            -------
        Normal  SparkApplicationAdded      17s   spark-operator  SparkApplication spark-pi-simple was added, enqueuing it for submission
        Normal  SparkApplicationSubmitted  14s   spark-operator  SparkApplication spark-pi-simple was submitted successfully
        Normal  SparkDriverRunning         13s   spark-operator  Driver spark-pi-simple-driver is running
        Normal  SparkExecutorPending       7s    spark-operator  Executor spark-pi-1626935142670-exec-1 is pending
        Normal  SparkExecutorRunning       6s    spark-operator  Executor spark-pi-1626935142670-exec-1 is running

      Replace <Namespace in which the cluster resides> with the namespace based on your business requirements. To view the namespace, log on to the EMR console and go to the Cluster Details tab.

      To obtain the job name, log on to the EMR console and go to the Cluster Details tab of the cluster details page. The created jobs are displayed in the Jobs section.

    • Terminate and delete a job. Syntax:
      kubectl delete SparkApplication <Job name> -n <Namespace in which the cluster resides>
      The following information is returned:
      sparkapplication.sparkoperator.k8s.io "spark-pi-simple" deleted
    • View the logs of a job. Syntax:
      kubectl logs <Job name-driver> -n <Namespace in which the cluster resides>
      Note If the job name is spark-pi-simple and the namespace is c-d2232227b95145d3, the command you run is kubectl logs spark-pi-simple-driver -n c-d2232227b95145d3.
      Information similar to the following output is returned:
      ......
      Pi is roughly 3.141488791414888
      21/07/22 14:37:57 INFO SparkContext: Successfully stopped SparkContext
      21/07/22 14:37:57 INFO ShutdownHookManager: Shutdown hook called
      21/07/22 14:37:57 INFO ShutdownHookManager: Deleting directory /var/data/spark-b6a43b55-a354-44d7-ae5e-45b8b1493edb/spark-56aae0d1-37b9-4a7d-9c99-4e4ca12deb4b
      21/07/22 14:37:57 INFO ShutdownHookManager: Deleting directory /tmp/spark-e2500491-6ed7-48d7-b94e-a9ebeb899320