Running large-scale Spark workloads on Kubernetes requires automating job deployment, resource allocation, and lifecycle management. The ack-spark-operator component handles these tasks, letting you define Spark jobs as Kubernetes resources and manage them with standard kubectl commands. This topic walks you through installing Spark Operator in an ACK cluster, submitting a Spark job, monitoring execution, and cleaning up resources.
How it works
Container Service for Kubernetes (ACK) provides the ack-spark-operator component, built on the open-source kubeflow/spark-operator. You submit and manage Spark jobs through CustomResourceDefinitions (CRDs) such as SparkApplication and ScheduledSparkApplication. Spark Operator monitors these resources and leverages Kubernetes features like auto scaling, health checks, and resource management to run jobs efficiently. For more information, see Spark Operator | Kubeflow.
Why run Spark on ACK:
Declarative job management -- Define Spark jobs as Kubernetes resources. Spark Operator handles deployment and lifecycle transitions automatically.
Multi-tenancy -- Isolate teams with Kubernetes namespaces and resource quotas. Use node selection to dedicate compute resources to specific Spark workloads.
Elastic resource provisioning -- Scale with Elastic Container Instance (ECI) or elastic node pools during peak hours to balance performance and cost.
Use cases:
Data analysis -- Interactive data exploration and cleansing with Spark.
Batch computing -- Scheduled jobs that process large datasets on a recurring basis.
Real-time processing -- Stream processing with the Spark Streaming library.
Procedure overview
Install ack-spark-operator -- Deploy Spark Operator in your ACK cluster.
Submit a Spark job -- Create a
SparkApplicationmanifest and apply it.Monitor the job -- Check job status, pod state, and driver logs.
Access the Spark web UI -- Port-forward the driver service to view execution details locally.
(Optional) Update the job -- Modify parameters and reapply the manifest.
(Optional) Delete the job -- Remove completed or unused Spark jobs to free resources.
Prerequisites
Before you begin, make sure you have:
An ACK Pro cluster or ACK Serverless Pro cluster running Kubernetes 1.24 or later. For more information, see Create an ACK managed cluster, Create an ACK Serverless cluster, and Manually upgrade ACK clusters
kubectl configured to connect to the cluster. For more information, see Obtain the kubeconfig file of a cluster and use kubectl to connect to the cluster
Step 1: Install ack-spark-operator
Log on to the ACK console. In the left-side navigation pane, choose Marketplace > Marketplace.
On the Marketplace page, click the App Catalog tab. Find and click ack-spark-operator.
On the ack-spark-operator page, click Deploy.
In the Deploy panel, select a cluster and namespace, keep the default release name, then click Next.
In the Parameters step, configure the parameters in the YAML editor as listed in the following table, then click OK.
Parameter
Description
Default
controller.replicasNumber of controller replicas.
1webhook.replicasNumber of webhook replicas.
1spark.jobNamespacesNamespaces where Spark jobs can run. Set to
[""]to allow all namespaces, or specify a list such as["ns1","ns2"].["default"]spark.serviceAccount.nameName of the ServiceAccount that Spark Operator automatically creates (along with the corresponding role-based access control (RBAC) resources) in each namespace specified by
spark.jobNamespaces. Specify this name in yourSparkApplicationmanifests.spark-operator-spark
Verify the installation
Run the following command to confirm that the Spark Operator pods are running:
kubectl get pods -n <operator-namespace>Replace <operator-namespace> with the namespace you selected during installation. The output should show pods in Running status:
NAME READY STATUS RESTARTS AGE
spark-operator-controller-xxxxx-xxxxx 1/1 Running 0 60s
spark-operator-webhook-xxxxx-xxxxx 1/1 Running 0 60sStep 2: Submit a Spark job
Create a SparkApplication manifest to define and submit a Spark job. The following example runs the SparkPi calculation.
Create a file named
spark-pi.yamlwith the following content:apiVersion: sparkoperator.k8s.io/v1beta2 kind: SparkApplication metadata: name: spark-pi namespace: default # Must match a namespace listed in spark.jobNamespaces spec: type: Scala mode: cluster image: registry-cn-hangzhou.ack.aliyuncs.com/ack-demo/spark:3.5.2 imagePullPolicy: IfNotPresent mainClass: org.apache.spark.examples.SparkPi mainApplicationFile: local:///opt/spark/examples/jars/spark-examples_2.12-3.5.2.jar arguments: - "1000" sparkVersion: 3.5.2 driver: cores: 1 coreLimit: 1200m memory: 512m serviceAccount: spark-operator-spark # Must match spark.serviceAccount.name executor: instances: 1 cores: 1 coreLimit: 1200m memory: 512m restartPolicy: type: NeverApply the manifest:
kubectl apply -f spark-pi.yamlExpected output:
sparkapplication.sparkoperator.k8s.io/spark-pi created
Step 3: Monitor the Spark job
After submitting a job, use kubectl to track its status, inspect pods, and read logs.
Check job status
kubectl get sparkapplication spark-piExample output:
NAME STATUS ATTEMPTS START FINISH SUSPEND AGE
spark-pi SUBMITTED 1 2024-06-04T03:17:11Z <no value> false 15sCheck pod status
List pods associated with the job:
kubectl get pod -l sparkoperator.k8s.io/app-name=spark-piExample output while the job is running:
NAME READY STATUS RESTARTS AGE
spark-pi-7272428fc8f5f392-exec-1 1/1 Running 0 13s
spark-pi-7272428fc8f5f392-exec-2 1/1 Running 0 13s
spark-pi-driver 1/1 Running 0 49sAfter the job completes, the driver automatically deletes all executor pods.
View job details
kubectl describe sparkapplication spark-piView driver logs
Read the last 20 log lines from the driver pod:
kubectl logs --tail=20 spark-pi-driverExample output:
24/05/30 10:05:30 INFO TaskSchedulerImpl: Removed TaskSet 0.0, whose tasks have all completed, from pool
24/05/30 10:05:30 INFO DAGScheduler: ResultStage 0 (reduce at SparkPi.scala:38) finished in 7.942 s
24/05/30 10:05:30 INFO DAGScheduler: Job 0 is finished. Cancelling potential speculative or zombie tasks for this job
24/05/30 10:05:30 INFO TaskSchedulerImpl: Killing all running tasks in stage 0: Stage finished
24/05/30 10:05:30 INFO DAGScheduler: Job 0 finished: reduce at SparkPi.scala:38, took 8.043996 s
Pi is roughly 3.1419522314195225
24/05/30 10:05:30 INFO SparkContext: SparkContext is stopping with exitCode 0.
24/05/30 10:05:30 INFO SparkUI: Stopped Spark web UI at http://spark-pi-1e18858fc8f56b14-driver-svc.default.svc:4040
24/05/30 10:05:30 INFO KubernetesClusterSchedulerBackend: Shutting down all executors
24/05/30 10:05:30 INFO KubernetesClusterSchedulerBackend$KubernetesDriverEndpoint: Asking each executor to shut down
24/05/30 10:05:30 WARN ExecutorPodsWatchSnapshotSource: Kubernetes client has been closed.
24/05/30 10:05:30 INFO MapOutputTrackerMasterEndpoint: MapOutputTrackerMasterEndpoint stopped!
24/05/30 10:05:30 INFO MemoryStore: MemoryStore cleared
24/05/30 10:05:30 INFO BlockManager: BlockManager stopped
24/05/30 10:05:30 INFO BlockManagerMaster: BlockManagerMaster stopped
24/05/30 10:05:30 INFO OutputCommitCoordinator$OutputCommitCoordinatorEndpoint: OutputCommitCoordinator stopped!
24/05/30 10:05:30 INFO SparkContext: Successfully stopped SparkContext
24/05/30 10:05:30 INFO ShutdownHookManager: Shutdown hook called
24/05/30 10:05:30 INFO ShutdownHookManager: Deleting directory /var/data/spark-14ed60f1-82cd-4a33-b1b3-9e5d975c5b1e/spark-01120c89-5296-4c83-8a20-0799eef4e0ee
24/05/30 10:05:30 INFO ShutdownHookManager: Deleting directory /tmp/spark-5f98ed73-576a-41be-855d-dabdcf7de189The line Pi is roughly 3.1419522314195225 confirms the computation completed successfully.
Step 4: Access the Spark web UI
The Spark web UI displays execution metrics for a running job. The web UI is accessible only while the job is running or the driver pod is in Running state. After the job completes, the web UI stops and becomes unavailable.
When ack-spark-operator is installed, the controller.uiService.enable parameter defaults to true, which automatically creates a Kubernetes Service for the web UI. If you set this parameter to false during installation, no Service is created and you must port-forward the pod directly.
The kubectl port-forward command is intended for testing environments only and is not suitable for production use.
Forward the web UI port to your local machine. Choose one of the following methods:
Port-forward via Service (when
controller.uiService.enableistrue):kubectl port-forward services/spark-pi-ui-svc 4040Port-forward via pod (when no Service is available):
kubectl port-forward pods/spark-pi-driver 4040Expected output:
Forwarding from 127.0.0.1:4040 -> 4040 Forwarding from [::1]:4040 -> 4040
Open http://127.0.0.1:4040 in your browser.
(Optional) Step 5: Update the Spark job
Modify the SparkApplication manifest and reapply it to update job parameters. The following example increases the computation from 1,000 to 10,000 iterations and scales executors from 1 to 2.
Edit
spark-pi.yaml. Change theargumentsandexecutor.instancesvalues:apiVersion: sparkoperator.k8s.io/v1beta2 kind: SparkApplication metadata: name: spark-pi spec: type: Scala mode: cluster image: registry-cn-hangzhou.ack.aliyuncs.com/ack-demo/spark:3.5.2 imagePullPolicy: IfNotPresent mainClass: org.apache.spark.examples.SparkPi mainApplicationFile: local:///opt/spark/examples/jars/spark-examples_2.12-3.5.2.jar arguments: - "10000" sparkVersion: 3.5.2 driver: cores: 1 coreLimit: 1200m memory: 512m serviceAccount: spark-operator-spark # Must match spark.serviceAccount.name executor: instances: 2 cores: 1 coreLimit: 1200m memory: 512m restartPolicy: type: NeverReapply the manifest:
kubectl apply -f spark-pi.yamlVerify that the job restarts with the new configuration:
kubectl get sparkapplication spark-piExpected output:
NAME STATUS ATTEMPTS START FINISH SUSPEND AGE spark-pi RUNNING 1 2024-06-04T03:37:34Z <no value> false 20m
(Optional) Step 6: Delete the Spark job
Delete the Spark job to release cluster resources.
Delete by manifest:
kubectl delete -f spark-pi.yamlAlternatively, delete by resource name:
kubectl delete sparkapplication spark-pi