By Alwyn Botha, Alibaba Cloud Tech Share Author. Tech Share is Alibaba Cloud's incentive program to encourage the sharing of technical knowledge and best practices within the cloud community.
This tutorial describes how to run 3 types of Kubernetes jobs:
Other Pods run continuously forever ( for example a web server or a database ).
All 3 types of jobs above have a fixed ( batch ) job to do. They finish it then their status becomes completed.
Create the following as your simple job example YAML spec file.
Note the kind is Job. The rest of the spec is the same as that of a normal Pod.
nano myJob.yaml
apiVersion: batch/v1
kind: Job
metadata:
name: myjob
spec:
template:
spec:
containers:
- name: myjob
image: alpine
imagePullPolicy: IfNotPresent
command: ['sh', '-c', 'echo Job Pod is Running ; sleep 10']
restartPolicy: Never
terminationGracePeriodSeconds: 0
A job does its work using Pods.
Pods do work using its containers.
Containers work using Docker images. Our example above uses the Alpine Linux docker image to do a job: sleep 10 seconds.
Even this simple example will teach us about Kubernetes jobs.
Create the Job
kubectl create -f myJob.yaml
job.batch/myjob created
Let's list all Pods by running kubectl get pods several times.
kubectl get pods
NAME READY STATUS RESTARTS AGE
myjob-hbhtl 1/1 Running 0 2s
NAME READY STATUS RESTARTS AGE
myjob-hbhtl 1/1 Running 0 4s
NAME READY STATUS RESTARTS AGE
myjob-hbhtl 1/1 Running 0 6s
NAME READY STATUS RESTARTS AGE
myjob-hbhtl 1/1 Running 0 9s
NAME READY STATUS RESTARTS AGE
myjob-hbhtl 0/1 Completed 0 12s
NAME READY STATUS RESTARTS AGE
myjob-hbhtl 0/1 Completed 0 16s
We see this Pod ran for around 10 seconds. Then its status turns to Completed .
In the READY column we see that after 10 seconds the Pod is no longer ready. It is complete.
Delete Job
kubectl delete -f myJob.yaml
job "myjob" deleted
We just used kubectl get pods to monitor the progress of a job.
This example will use kubectl get job to monitor progress.
kubectl create -f myJob.yaml
job.batch/myjob created
Repeatedly run kubectl get job
kubectl get job
NAME COMPLETIONS DURATION AGE
myjob 0/1 3s 3s
NAME COMPLETIONS DURATION AGE
myjob 0/1 5s 5s
NAME COMPLETIONS DURATION AGE
myjob 0/1 7s 7s
NAME COMPLETIONS DURATION AGE
myjob 0/1 9s 9s
NAME COMPLETIONS DURATION AGE
myjob 1/1 12s 12s
Only the last line shows : COMPLETIONS 1/1.
Frankly not information-loaded output at all. Later with more complex jobs it will reveal its value.
With experience you will learn when to use which :
Demo complete, delete ...
kubectl delete -f myJob.yaml
job.batch "myjob" deleted
If your job has an unrecoverable error you may want to prevent it from continuously trying to run.
backoffLimit specify the number of retries before a Job is marked as failed. Default: 6
We set it to 2 in the example so that we can see it in action within seconds.
nano myJob.yaml
apiVersion: batch/v1
kind: Job
metadata:
name: myjob
spec:
template:
spec:
containers:
- name: myjob
image: alpine
imagePullPolicy: IfNotPresent
command: ['sh', '-c', 'echo Job Pod is Running ; exit 1']
restartPolicy: Never
terminationGracePeriodSeconds: 0
backoffLimit: 2
Note the exit 1 error exit code. This container will start up and exit immediately with an error condition.
Create the Job
kubectl create -f myJob.yaml
job/myjob created
Monitor job progress several times:
kubectl get pods
NAME READY STATUS RESTARTS AGE
myjob-drzhx 0/1 ContainerCreating 0 2s
myjob-hwdc5 0/1 Error 0 3s
NAME READY STATUS RESTARTS AGE
myjob-drzhx 0/1 Error 0 5s
myjob-hwdc5 0/1 Error 0 6s
NAME READY STATUS RESTARTS AGE
myjob-drzhx 0/1 Error 0 9s
myjob-hwdc5 0/1 Error 0 10s
NAME READY STATUS RESTARTS AGE
myjob-drzhx 0/1 Error 0 14s
myjob-gks4f 0/1 Error 0 4s
myjob-hwdc5 0/1 Error 0 15s
backoffLimit: 2 ... job stops creating containers when third error occurs.
Describe detail about our job: ( only relevant fields shown )
kubectl describe job/myjob
Name: myjob
Parallelism: 1
Completions: 1
Start Time: Thu, 24 Jan 2019 07:50:07 +0200
Pods Statuses: 0 Running / 0 Succeeded / 3 Failed
Pod Template:
Containers:
myjob:
Command:
sh
-c
echo Job Pod is Running ; exit 1
Events:
Type Reason Age From Message
---- ------ ---- ---- -------
Normal SuccessfulCreate 35s job-controller Created pod: myjob-hwdc5
Normal SuccessfulCreate 34s job-controller Created pod: myjob-drzhx
Normal SuccessfulCreate 24s job-controller Created pod: myjob-gks4f
Warning BackoffLimitExceeded 4s job-controller Job has reached the specified backoff limit
Most informative lines are:
Pods Statuses: 0 Running / 0 Succeeded / 3 Failed
Warning BackoffLimitExceeded 4s job-controller Job has reached the specified backoff limit
We have not discussed Parallelism: 1 and Completions: 1 yet. It simply means one Pod must run in parallel and one Pod must complete for job to be considered complete.
Determine a suitable backoffLimit for each of your production batch jobs.
Delete Job
kubectl delete -f myJob.yaml
job "myjob" deleted
Job completions specify how many Pods must successfully complete for job to be considered complete.
Our job Pod below has work: sleep 3 seconds
4 Pods must complete ... completions: 4
nano myJob.yaml
apiVersion: batch/v1
kind: Job
metadata:
name: myjob
spec:
template:
spec:
containers:
- name: myjob
image: alpine
imagePullPolicy: IfNotPresent
command: ['sh', '-c', 'echo Job Pod is Running ; sleep 3']
restartPolicy: Never
terminationGracePeriodSeconds: 0
backoffLimit: 2
completions: 4
Create the Job
kubectl create -f myJob.yaml
job/myjob created
This time follow progress using kubectl get jobs myjob repeatedly.
kubectl get jobs myjob
NAME COMPLETIONS DURATION AGE
myjob 0/4 2s 2s
NAME COMPLETIONS DURATION AGE
myjob 0/4 4s 4s
NAME COMPLETIONS DURATION AGE
myjob 1/4 6s 6s
NAME COMPLETIONS DURATION AGE
myjob 1/4 9s 9s
NAME COMPLETIONS DURATION AGE
myjob 2/4 12s 12s
NAME COMPLETIONS DURATION AGE
myjob 3/4 14s 14s
NAME COMPLETIONS DURATION AGE
myjob 3/4 16s 16s
NAME COMPLETIONS DURATION AGE
myjob 4/4 18s 19s
We can clearly see the 4 Pods each sleeping 3 seconds successfully and one-by-one, not all simultaneously .
Delete Job
kubectl delete -f myJob.yaml
job "myjob" deleted
Same job but another demo: this time we monitor progress using kubectl get po
Create the Job
kubectl create -f myJob.yaml
job/myjob created
Monitor progress:
kubectl get po
NAME READY STATUS RESTARTS AGE
myjob-74f5q 0/1 ContainerCreating 0 1s
NAME READY STATUS RESTARTS AGE
myjob-74f5q 1/1 Running 0 4s
NAME READY STATUS RESTARTS AGE
myjob-74f5q 0/1 Completed 0 7s
myjob-7gf7x 1/1 Running 0 2s
NAME READY STATUS RESTARTS AGE
myjob-74f5q 0/1 Completed 0 9s
myjob-7gf7x 0/1 Completed 0 4s
myjob-pfssq 0/1 ContainerCreating 0 0s
NAME READY STATUS RESTARTS AGE
myjob-74f5q 0/1 Completed 0 11s
myjob-7gf7x 0/1 Completed 0 6s
myjob-pfssq 0/1 ContainerCreating 0 2s
NAME READY STATUS RESTARTS AGE
myjob-74f5q 0/1 Completed 0 13s
myjob-7gf7x 0/1 Completed 0 8s
myjob-pfssq 1/1 Running 0 4s
NAME READY STATUS RESTARTS AGE
myjob-74f5q 0/1 Completed 0 17s
myjob-7gf7x 0/1 Completed 0 12s
myjob-pbmbt 1/1 Running 0 3s
myjob-pfssq 0/1 Completed 0 8s
NAME READY STATUS RESTARTS AGE
myjob-74f5q 0/1 Completed 0 19s
myjob-7gf7x 0/1 Completed 0 14s
myjob-pbmbt 0/1 Completed 0 5s
myjob-pfssq 0/1 Completed 0 10s
Note that only 1 Pod runs at a time. ( Default parallelism is one )
Delete Job
kubectl delete -f myJob.yaml
job "myjob" deleted
I am running this on a 4 core server.
If you are running these exercises on an at-least 2 core server the following demo will work.
parallelism: 2 below specifies we want to run 2 Pods simultaneously.
nano myJob.yaml
apiVersion: batch/v1
kind: Job
metadata:
name: myjob
spec:
template:
spec:
containers:
- name: myjob
image: alpine
imagePullPolicy: IfNotPresent
command: ['sh', '-c', 'echo Job Pod is Running ; sleep 3']
restartPolicy: Never
terminationGracePeriodSeconds: 0
backoffLimit: 2
completions: 4
parallelism: 2
Create the Job
kubectl create -f myJob.yaml
job/myjob created
Monitor progress.
kubectl get pods
NAME READY STATUS RESTARTS AGE
myjob-11111 0/1 ContainerCreating 0 2s
myjob-22222 0/1 ContainerCreating 0 2s
NAME READY STATUS RESTARTS AGE
myjob-11111 1/1 Running 0 4s
myjob-22222 1/1 Running 0 4s
NAME READY STATUS RESTARTS AGE
myjob-33333 1/1 Running 0 1s
myjob-44444 1/1 Running 0 1s
myjob-11111 0/1 Completed 0 7s
myjob-22222 0/1 Completed 0 7s
NAME READY STATUS RESTARTS AGE
myjob-33333 1/1 Running 0 3s
myjob-44444 1/1 Running 0 3s
myjob-11111 0/1 Completed 0 9s
myjob-22222 0/1 Completed 0 9s
NAME READY STATUS RESTARTS AGE
myjob-33333 0/1 Completed 0 5s
myjob-44444 0/1 Completed 0 5s
myjob-11111 0/1 Completed 0 11s
myjob-22222 0/1 Completed 0 11s
We needed completions: 4 Therefore 2 Pods were run in parallel twice to get the completions done.
Next we monitor progress using kubectl get jobs myjob
Delete Job
kubectl delete -f myJob.yaml
job "myjob" deleted
Same job created again.
kubectl create -f myJob.yaml
job.batch/myjob created
Monitor:
kubectl get jobs myjob
NAME COMPLETIONS DURATION AGE
myjob 0/4 2s 2s
NAME COMPLETIONS DURATION AGE
myjob 0/4 5s 5s
NAME COMPLETIONS DURATION AGE
myjob 2/4 9s 9s
NAME COMPLETIONS DURATION AGE
myjob 4/4 9s 11s
As expected COMPLETIONS get done in multiples of 2.
If you are using at least 4 CPU server you can run this example :
We need 4 completions - all 4 run in parallel.
nano myJob.yaml
apiVersion: batch/v1
kind: Job
metadata:
name: myjob
spec:
template:
spec:
containers:
- name: myjob
image: alpine
imagePullPolicy: IfNotPresent
command: ['sh', '-c', 'echo Job Pod is Running ; sleep 3']
restartPolicy: Never
terminationGracePeriodSeconds: 0
backoffLimit: 2
completions: 4
parallelism: 4
Create the Job
kubectl create -f myJob.yaml
job/myjob created
Monitor:
kubectl get pods
NAME READY STATUS RESTARTS AGE
myjob-2j2fz 1/1 Running 0 2s
myjob-hkzmh 1/1 Running 0 2s
myjob-ngxps 1/1 Running 0 2s
myjob-skrbd 1/1 Running 0 2s
NAME READY STATUS RESTARTS AGE
myjob-2j2fz 1/1 Running 0 4s
myjob-hkzmh 1/1 Running 0 4s
myjob-ngxps 1/1 Running 0 4s
myjob-skrbd 1/1 Running 0 4s
NAME READY STATUS RESTARTS AGE
myjob-2j2fz 0/1 Completed 0 6s
myjob-hkzmh 0/1 Completed 0 6s
myjob-ngxps 0/1 Completed 0 6s
myjob-skrbd 0/1 Completed 0 6s
As expected 4 Pods running simultaneously.
Not shown - kubectl get jobs myjob ... what do you expect the output to look like?
Delete Job
kubectl delete -f myJob.yaml
job "myjob" deleted
activeDeadlineSeconds specifies total runtime for job as a whole.
Once a Job reaches activeDeadlineSeconds, all of its Pods are terminated and the Job status will become type: Failed with reason: DeadlineExceeded.
This exercise will demo this DeadlineExceeded
Note last line below: absurd low activeDeadlineSeconds: 3
nano myJob.yaml
apiVersion: batch/v1
kind: Job
metadata:
name: myjob
spec:
template:
spec:
containers:
- name: myjob
image: alpine
imagePullPolicy: IfNotPresent
command: ['sh', '-c', 'echo Job Pod is Running ; sleep 4']
restartPolicy: Never
terminationGracePeriodSeconds: 0
backoffLimit: 2
completions: 4
parallelism: 4
activeDeadlineSeconds: 3
Create the Job
kubectl create -f myJob.yaml
job/myjob created
Repeatedly monitor progress:
kubectl get pods
NAME READY STATUS RESTARTS AGE
myjob-7xlch 1/1 Running 0 2s
myjob-8b9wl 1/1 Running 0 2s
myjob-qvpc7 1/1 Running 0 2s
myjob-rnhrg 1/1 Running 0 2s
kubectl get pods
No resources found.
Note after 3 seconds Pods no longer exist. We will see why below.
Check job status:
kubectl get jobs myjob
NAME COMPLETIONS DURATION AGE
myjob 0/4 20s 20s
Disappointingly NO indication that we have a problem job.
Describe details about job:
kubectl describe job/myjob
Name: myjob
Namespace: default
Parallelism: 4
Completions: 4
Start Time: Thu, 24 Jan 2019 08:25:53 +0200
Active Deadline Seconds: 3s
Pods Statuses: 0 Running / 0 Succeeded / 4 Failed
Events:
Type Reason Age From Message
---- ------ ---- ---- -------
Normal SuccessfulCreate 8s job-controller Created pod: myjob-5rcwp
Normal SuccessfulCreate 8s job-controller Created pod: myjob-nbdz4
Normal SuccessfulCreate 8s job-controller Created pod: myjob-4jb28
Normal SuccessfulCreate 7s job-controller Created pod: myjob-vb765
Normal SuccessfulDelete 4s job-controller Deleted pod: myjob-vb765
Normal SuccessfulDelete 4s job-controller Deleted pod: myjob-4jb28
Normal SuccessfulDelete 4s job-controller Deleted pod: myjob-5rcwp
Normal SuccessfulDelete 4s job-controller Deleted pod: myjob-nbdz4
Warning DeadlineExceeded 4s job-controller Job was active longer than specified deadline
Much more helpful: 0 Succeeded / 4 Failed
Also informative: Warning - - DeadlineExceeded - - Job was active longer than specified deadline
Now we see why our Pods disappeared ... they were deleted.
I do not understand the logic:
Let's attempt to investigate the failed logs:
kubectl logs myjob-5rcwp
Error from server (NotFound): pods "myjob-5rcwp" not found
NOTE: use activeDeadlineSeconds only when you have successfully resolved this missing logs issue.
Delete Job
kubectl delete -f myJob.yaml
job "myjob" deleted
Kubernetes documentation contains 2 complex examples of parallel jobs
More than 80% of those guides focus on setting up the work queue functionality.
This tutorial focus on learning about parallel jobs and queues using my very simple bash implementation.
We need a directory for our bash work queue functionality:
mkdir /usr/share/jobdemo
Our work queue:
nano /usr/share/jobdemo/workqueue
1
2
3
4
5
6
7
8
9
10
11
12
We will need this backup later ( Running Pods will delete lines from our workqueue file )
cp /usr/share/jobdemo/workqueue /usr/share/jobdemo/workqueue-backup
Work queue processing script:
nano /usr/share/jobdemo/jobscript.sh
#!/bin/bash
for counter in 1 2 3 4 5 6 7 8 9 10 11 12
do
if [ -s /usr/share/jobdemo/workqueue ]
then
echo " did some work "
sed -i '1d' /usr/share/jobdemo/workqueue
sleep 1
else
echo " no more work left "
exit 0
fi
done
exit 0
Program explanation ( focus on getting minimal work queue just barely working ) :
Basically every Pod will delete whatever first line it finds from workqueue file and echo that they did some work.
When a Pod finds workqueue empty it just exits with 0 code which means success.
Seeing this in action several times will make it more clear.
We need to place the workqueue file and the jobscript on a persistent volume. Now all Pods will use the same workqueue file. Every Pod will take work from the SAME queue and delete the line ( work ) it took from the queue.
nano myPersistent-Volume.yaml
kind: PersistentVolume
apiVersion: v1
metadata:
name: my-persistent-volume
labels:
type: local
spec:
storageClassName: pv-demo
capacity:
storage: 10Mi
accessModes:
- ReadWriteOnce
hostPath:
path: "/usr/share/jobdemo"
Create a 10Mi Persistent Volume pointing to the location / path of our 2 workqueue objects.
Reference : https://kubernetes.io/docs/concepts/storage/volumes/
nano myPersistent-VolumeClaim.yaml
kind: PersistentVolumeClaim
apiVersion: v1
metadata:
name: my-persistent-volumeclaim
spec:
storageClassName: pv-demo
accessModes:
- ReadWriteOnce
resources:
requests:
storage: 10Mi
Claim usage of storageClassName: pv-demo -> pointing to the Persistent Volume.
Reference : https://kubernetes.io/docs/concepts/storage/persistent-volumes/
kubectl create -f myPersistent-Volume.yaml
kubectl create -f myPersistent-VolumeClaim.yaml
This example uses one Pod to read and process a workqueue until it is empty.
Create myWorkqueue-Job.yaml
nano myWorkqueue-Job.yaml
apiVersion: batch/v1
kind: Job
metadata:
name: myjob
spec:
template:
spec:
containers:
- name: myjob
image: alpine
imagePullPolicy: IfNotPresent
command: ['sh', '-c', 'source /usr/share/jobdemo/jobscript.sh']
restartPolicy: Never
terminationGracePeriodSeconds: 0
volumeMounts:
- mountPath: "/usr/share/jobdemo"
name: my-persistent-volumeclaim-name
volumes:
- name: my-persistent-volumeclaim-name
persistentVolumeClaim:
claimName: my-persistent-volumeclaim
restartPolicy: Never
terminationGracePeriodSeconds: 0
parallelism: 1
This spec mounts our persistent volume and the command runs our jobscript.sh
Create the Job.
kubectl create -f myWorkqueue-Job.yaml
job.batch/myjob created
Repeatedly run kubectl get pods ... monitor progress for parallelism: 1
kubectl get pods
NAME READY STATUS RESTARTS AGE
myjob-x9ghz 1/1 Running 0 3s
NAME READY STATUS RESTARTS AGE
myjob-x9ghz 1/1 Running 0 6s
NAME READY STATUS RESTARTS AGE
myjob-x9ghz 1/1 Running 0 8s
NAME READY STATUS RESTARTS AGE
myjob-x9ghz 1/1 Running 0 10s
NAME READY STATUS RESTARTS AGE
myjob-x9ghz 1/1 Running 0 12s
NAME READY STATUS RESTARTS AGE
myjob-x9ghz 0/1 Completed 0 15s
Our single Pod took 12 seconds to delete the workqueue lines 1 by 1.
We can see this in the Pod log.
kubectl logs pod/myjob-x9ghz
did some work
did some work
did some work
did some work
did some work
did some work
did some work
did some work
did some work
did some work
did some work
did some work
Delete job.
kubectl delete -f myWorkqueue-Job.yaml
job.batch "myjob" deleted
Note last line in spec: we are now going to run 2 Pods in parallel.
nano myWorkqueue-Job.yaml
apiVersion: batch/v1
kind: Job
metadata:
name: myjob
spec:
template:
spec:
containers:
- name: myjob
image: alpine
imagePullPolicy: IfNotPresent
command: ['sh', '-c', 'source /usr/share/jobdemo/jobscript.sh']
restartPolicy: Never
terminationGracePeriodSeconds: 0
volumeMounts:
- mountPath: "/usr/share/jobdemo"
name: my-persistent-volumeclaim-name
volumes:
- name: my-persistent-volumeclaim-name
persistentVolumeClaim:
claimName: my-persistent-volumeclaim
restartPolicy: Never
terminationGracePeriodSeconds: 0
parallelism: 2
We need to put the deleted lines back into workqueue file.
cp /usr/share/jobdemo/workqueue-backup /usr/share/jobdemo/workqueue
kubectl create -f myWorkqueue-Job.yaml
job.batch/myjob created
Repeatedly run kubectl get pods ... monitor parallelism: 2
kubectl get pods
NAME READY STATUS RESTARTS AGE
myjob-9nkj5 0/1 ContainerCreating 0 2s
myjob-gdrj8 0/1 ContainerCreating 0 2s
NAME READY STATUS RESTARTS AGE
myjob-9nkj5 1/1 Running 0 4s
myjob-gdrj8 1/1 Running 0 4s
NAME READY STATUS RESTARTS AGE
myjob-9nkj5 1/1 Running 0 6s
myjob-gdrj8 1/1 Running 0 6s
NAME READY STATUS RESTARTS AGE
myjob-9nkj5 1/1 Running 0 8s
myjob-gdrj8 1/1 Running 0 8s
NAME READY STATUS RESTARTS AGE
myjob-9nkj5 0/1 Completed 0 11s
myjob-gdrj8 0/1 Completed 0 11s
As expected 2 Pods were running parallel all the time.
If we investigate the logs of both our Pods we can see that each did around half the work.
Both Pods then exited with exit 0 when they found 'no more work left' ( workqueue empty ).
kubectl logs pod/myjob-9nkj5
did some work
did some work
did some work
did some work
did some work
did some work
no more work left
kubectl logs pod/myjob-gdrj8
did some work
did some work
did some work
did some work
did some work
did some work
no more work left
Get job overview.
kubectl get jobs
NAME COMPLETIONS DURATION AGE
myjob 2/1 of 2 9s 98s
We can see 2 Pods simultaneously took 9s versus 12s for just one Pod.
Delete job.
kubectl delete -f myWorkqueue-Job.yaml
job.batch "myjob" deleted
Note last line in spec: we are now going to run 4 Pods in parallel.
nano myWorkqueue-Job.yaml
apiVersion: batch/v1
kind: Job
metadata:
name: myjob
spec:
template:
spec:
containers:
- name: myjob
image: alpine
imagePullPolicy: IfNotPresent
command: ['sh', '-c', 'source /usr/share/jobdemo/jobscript.sh']
restartPolicy: Never
terminationGracePeriodSeconds: 0
volumeMounts:
- mountPath: "/usr/share/jobdemo"
name: my-persistent-volumeclaim-name
volumes:
- name: my-persistent-volumeclaim-name
persistentVolumeClaim:
claimName: my-persistent-volumeclaim
restartPolicy: Never
terminationGracePeriodSeconds: 0
parallelism: 4
We need to put the deleted lines back into workqueue file.
cp /usr/share/jobdemo/workqueue-backup /usr/share/jobdemo/workqueue
Create the job.
kubectl create -f myWorkqueue-Job.yaml
job.batch/myjob created
Monitor:
kubectl get pods
NAME READY STATUS RESTARTS AGE
myjob-8l6bc 0/1 ContainerCreating 0 1s
myjob-krmcc 0/1 ContainerCreating 0 1s
myjob-lxd9w 0/1 ContainerCreating 0 1s
myjob-ntgf7 0/1 ContainerCreating 0 1s
NAME READY STATUS RESTARTS AGE
myjob-8l6bc 1/1 Running 0 4s
myjob-krmcc 1/1 Running 0 4s
myjob-lxd9w 1/1 Running 0 4s
myjob-ntgf7 1/1 Running 0 4s
NAME READY STATUS RESTARTS AGE
myjob-8l6bc 0/1 Completed 0 6s
myjob-krmcc 0/1 Completed 0 6s
myjob-lxd9w 1/1 Running 0 6s
myjob-ntgf7 0/1 Completed 0 6s
NAME READY STATUS RESTARTS AGE
myjob-8l6bc 0/1 Completed 0 8s
myjob-krmcc 0/1 Completed 0 8s
myjob-lxd9w 0/1 Completed 0 8s
myjob-ntgf7 0/1 Completed 0 8s
4 Pods starting up simultaneously.
4 Pods running parallel as expected.
kubectl get jobs
NAME COMPLETIONS DURATION AGE
myjob 4/1 of 4 6s 18s
4 parallel Pods faster than 2. ( Overhead of ContainerCreating prevents it from being twice as fast ).
Describe detail about our job: ( only relevant fields shown )
kubectl describe job/myjob
Name: myjob
Parallelism: 4
Start Time: Thu, 24 Jan 2019 12:58:11 +0200
Completed At: Thu, 24 Jan 2019 12:58:17 +0200
Duration: 6s
Pods Statuses: 0 Running / 4 Succeeded / 0 Failed
Events:
Type Reason Age From Message
---- ------ ---- ---- -------
Normal SuccessfulCreate 2m17s job-controller Created pod: myjob-8l6bc
Normal SuccessfulCreate 2m17s job-controller Created pod: myjob-lxd9w
Normal SuccessfulCreate 2m17s job-controller Created pod: myjob-krmcc
Normal SuccessfulCreate 2m17s job-controller Created pod: myjob-ntgf7
4 Succeeded / 0 Failed and only success lines in events at the bottom.
This is example of a perfectly done job.
These 2 outputs below mean the same thing ( success ... note all the 4s for our parallelism: 4 job )
kubectl get jobs
NAME COMPLETIONS DURATION AGE
myjob 4/1 of 4 6s 18s
4 lines of 4 Completed Pods with zero RESTARTS.
kubectl get pods
NAME READY STATUS RESTARTS AGE
myjob-8l6bc 0/1 Completed 0 8s
myjob-krmcc 0/1 Completed 0 8s
myjob-lxd9w 0/1 Completed 0 8s
myjob-ntgf7 0/1 Completed 0 8s
Delete job.
kubectl delete -f myWorkqueue-Job.yaml
job.batch "myjob" deleted
Note last line in spec: we are now going to run 8 Pods in parallel. ( On a node with only 4 CPU cores )
nano myWorkqueue-Job.yaml
apiVersion: batch/v1
kind: Job
metadata:
name: myjob
spec:
template:
spec:
containers:
- name: myjob
image: alpine
imagePullPolicy: IfNotPresent
command: ['sh', '-c', 'source /usr/share/jobdemo/jobscript.sh']
volumeMounts:
- mountPath: "/usr/share/jobdemo"
name: my-persistent-volumeclaim-name
restartPolicy: Never
terminationGracePeriodSeconds: 0
volumes:
- name: my-persistent-volumeclaim-name
persistentVolumeClaim:
claimName: my-persistent-volumeclaim
terminationGracePeriodSeconds: 0
parallelism: 8
We need to put the deleted lines back into workqueue file.
cp /usr/share/jobdemo/workqueue-backup /usr/share/jobdemo/workqueue
Create job.
kubectl create -f myWorkqueue-Job.yaml
job.batch/myjob created
Monitor:
kubectl get pods
NAME READY STATUS RESTARTS AGE
myjob-88g75 0/1 ContainerCreating 0 2s
myjob-8wj9w 0/1 ContainerCreating 0 2s
myjob-br9bb 0/1 ContainerCreating 0 2s
myjob-pfnth 0/1 Pending 0 2s
myjob-r9p46 0/1 ContainerCreating 0 2s
myjob-scjwx 0/1 ContainerCreating 0 2s
myjob-zm92q 0/1 ContainerCreating 0 2s
myjob-zt8k6 0/1 ContainerCreating 0 2s
NAME READY STATUS RESTARTS AGE
myjob-88g75 0/1 ContainerCreating 0 6s
myjob-8wj9w 0/1 Completed 0 6s
myjob-br9bb 0/1 ContainerCreating 0 6s
myjob-pfnth 0/1 ContainerCreating 0 6s
myjob-r9p46 0/1 ContainerCreating 0 6s
myjob-scjwx 0/1 Completed 0 6s
myjob-zm92q 0/1 Completed 0 6s
myjob-zt8k6 0/1 Completed 0 6s
NAME READY STATUS RESTARTS AGE
myjob-88g75 0/1 Completed 0 10s
myjob-8wj9w 0/1 Completed 0 10s
myjob-br9bb 0/1 Completed 0 10s
myjob-pfnth 0/1 Completed 0 10s
myjob-r9p46 0/1 Completed 0 10s
myjob-scjwx 0/1 Completed 0 10s
myjob-zm92q 0/1 Completed 0 10s
myjob-zt8k6 0/1 Completed 0 10s
Determine total job runtime.
kubectl get jobs
NAME COMPLETIONS DURATION AGE
myjob 8/1 of 8 8s 19s
Using 4 Pods took 9 seconds and using 8 Pods took 8 seconds.
It does not make sense to run more CPU-intensive workload Pods in parallel than CPU cores on server. Jobs will context switch too much.
https://en.wikipedia.org/wiki/Context_switch
Notice how we used a very simple basic bash script to emulate a work queue.
Even that VERY simple script enabled us to learn a great deal about parallel jobs processing ONE shared, SIMULATED work queue.
Cleanup - delete job.
kubectl delete -f myWorkqueue-Job.yaml
job.batch "myjob" deleted
Some different ways to manage parallel jobs are discussed at https://kubernetes.io/docs/concepts/workloads/controllers/jobs-run-to-completion/#job-patterns
2,599 posts | 762 followers
FollowAlibaba Cloud Native Community - March 11, 2024
Alibaba Clouder - July 2, 2019
Alibaba Clouder - July 2, 2019
Alibaba Container Service - February 13, 2021
Alibaba Container Service - February 12, 2021
Apache Flink Community China - March 17, 2023
2,599 posts | 762 followers
FollowAlibaba Cloud Container Service for Kubernetes is a fully managed cloud container management service that supports native Kubernetes and integrates with other Alibaba Cloud products.
Learn MoreProvides a control plane to allow users to manage Kubernetes clusters that run based on different infrastructure resources
Learn MoreA secure image hosting platform providing containerized image lifecycle management
Learn MoreAn agile and secure serverless container instance service.
Learn MoreMore Posts by Alibaba Clouder
5004576456507455 December 16, 2019 at 12:39 am
All the work queue items not working,Getting volume mount errorParallel Jobs with a Work Queue - Simplest ExampleParallel Jobs with a Work Queue : Parallelism: 8