By Alwyn Botha, Alibaba Cloud Tech Share Author. Tech Share is Alibaba Cloud's incentive program to encourage the sharing of technical knowledge and best practices within the cloud community.
You can suspend the overall running of a cron job by using suspend .
Below is the spec for a normal running cron job. ( We need a basic running cron job to suspend ).
nano myCronJob.yaml
apiVersion: batch/v1beta1
kind: CronJob
metadata:
name: mycronjob
spec:
schedule: "*/1 * * * *"
jobTemplate:
spec:
template:
spec:
containers:
- name: mycron-container
image: alpine
imagePullPolicy: IfNotPresent
command: ['sh', '-c', 'echo Job Pod is Running ; sleep 5']
restartPolicy: OnFailure
terminationGracePeriodSeconds: 0
concurrencyPolicy: Replace
Start the cron job.
kubectl create -f myCronJob.yaml
cronjob.batch/mycronjob created
get cronjob output
kubectl get cronjob
NAME SCHEDULE SUSPEND ACTIVE LAST SCHEDULE AGE
mycronjob */1 * * * * False 0 47s 5m7s
Now SUSPEND is relevant.
All along this tutorial this was always False: job is not suspended.
Even right now we have a running cron job.
Add only the suspend: true line to myCronJob.yaml as shown below.
concurrencyPolicy ONLY shown so that your indentation is correct.
concurrencyPolicy: Replace
suspend: true
You use kubectl replace to replace a running Kubernetes object with a new one.
In this specific case we replace our running cron job with one that specifies : suspend: true
kubectl replace -f myCronJob.yaml
cronjob.batch/mycronjob replaced
Investigate get cronjob output again:
kubectl get cronjob
NAME SCHEDULE SUSPEND ACTIVE LAST SCHEDULE AGE
mycronjob */1 * * * * True 0 62s 5m22s
This time it shows SUSPEND True
6 minutes later it will show LAST SCHEDULE job was 6 minutes ago.
This is as expected since cron job is now SUSPENDED.
kubectl get cronjob
NAME SCHEDULE SUSPEND ACTIVE LAST SCHEDULE AGE
mycronjob */1 * * * * True 0 6m23s 10m
To get cron job running again we remove the suspend: true line from the spec file.
concurrencyPolicy should be the last line.
concurrencyPolicy: Replace
Replace running cron job with the latest spec from myCronJob.yaml
kubectl replace -f myCronJob.yaml
cronjob.batch/mycronjob replaced
Verify SUSPEND is lifted.
kubectl get cronjob
NAME SCHEDULE SUSPEND ACTIVE LAST SCHEDULE AGE
mycronjob */1 * * * * False 0 6m23s 10m
Yes, SUSPEND is False.
47 seconds later ... we see LAST SCHEDULE showing jobs get scheduled again.
kubectl get cronjob
NAME SCHEDULE SUSPEND ACTIVE LAST SCHEDULE AGE
mycronjob */1 * * * * False 0 47s 11m
Investigate detail of describe cronjob/mycronjob
kubectl describe cronjob/mycronjob
Events:
Type Reason Age From Message
---- ------ ---- ---- -------
Normal SuccessfulCreate 11m cronjob-controller Created job mycronjob-1548668040
Normal SawCompletedJob 11m cronjob-controller Saw completed job: mycronjob-1548668040
Normal SuccessfulCreate 10m cronjob-controller Created job mycronjob-1548668100
Normal SawCompletedJob 10m cronjob-controller Saw completed job: mycronjob-1548668100
Normal SuccessfulCreate 9m13s cronjob-controller Created job mycronjob-1548668160
Normal SawCompletedJob 9m3s cronjob-controller Saw completed job: mycronjob-1548668160
Normal SuccessfulCreate 8m13s cronjob-controller Created job mycronjob-1548668220
Normal SawCompletedJob 8m3s cronjob-controller Saw completed job: mycronjob-1548668220
Normal SuccessfulDelete 8m3s cronjob-controller Deleted job mycronjob-1548668040
Normal SuccessfulCreate 7m13s cronjob-controller Created job mycronjob-1548668280
Normal SawCompletedJob 7m3s cronjob-controller Saw completed job: mycronjob-1548668280
Normal SuccessfulDelete 7m2s cronjob-controller Deleted job mycronjob-1548668100
Normal SuccessfulCreate 52s cronjob-controller Created job mycronjob-1548668640
Normal SawCompletedJob 42s cronjob-controller Saw completed job: mycronjob-1548668640
Normal SuccessfulDelete 42s cronjob-controller Deleted job mycronjob-1548668160
Normal SuccessfulCreate 21s cronjob-controller Created job mycronjob-1548668700
Normal SawCompletedJob 11s cronjob-controller Saw completed job: mycronjob-1548668700
Normal SuccessfulDelete 11s cronjob-controller Deleted job mycronjob-1548668220
We see a clear gap during which no jobs for this cron job ran.
Unfortunately Kubernetes does not add an event line specifying :
... cron job suspended here
and
... cron job UNsuspended here.
We have to surmise that. It could have been Kubernetes was halted, server rebooted, or something else. Having detail that stated ... a user-invoked suspend happened here ... would have made it clear what happened.
Demo done, delete.
kubectl delete -f myCronJob.yaml
cronjob.batch "mycronjob" deleted
Detail Reference - https://kubernetes.io/docs/tasks/job/automated-tasks-with-cron-jobs/#starting-deadline
If this field is not specified, the jobs have no deadline.
Summary: if a job misses its scheduled time by startingDeadlineSeconds it gets skipped.
The next scheduled time it will attempt to run again.
Below we have a cron job that should run every minute.
The work of this cron job is to sleep for 80 seconds.
We have concurrencyPolicy: Forbid specified. Two or more jobs may not run simultaneously.
startingDeadlineSeconds: 10 means it must start within 10 seconds each minute.
The Pod sleeps for 80 seconds means it will still be running a minute later. One minute later the next job cannot start ( concurrencyPolicy: Forbid ) because previous job still has 20 seconds of running time left. This second job will be skipped. This is what we attempt to observe below.
nano myCronJob.yaml
apiVersion: batch/v1beta1
kind: CronJob
metadata:
name: mycronjob
spec:
schedule: "*/1 * * * *"
jobTemplate:
spec:
template:
spec:
containers:
- name: mycron-container
image: alpine
imagePullPolicy: IfNotPresent
command: ['sh', '-c', 'echo Job Pod is Running ; sleep 80']
restartPolicy: OnFailure
terminationGracePeriodSeconds: 0
concurrencyPolicy: Forbid
startingDeadlineSeconds: 10
Create cron job.
kubectl create -f myCronJob.yaml
cronjob.batch/mycronjob created
Investigate status a minute later:
kubectl get job
NAME COMPLETIONS DURATION AGE
mycronjob-1548669600 0/1 57s 57s
kubectl get pods
NAME READY STATUS RESTARTS AGE
mycronjob-1548669600-mzp2x 1/1 Running 0 58s
Previous job continues to run. ( No new job started: it should have run 15 seconds in this output. ) New job skipped as expected.
kubectl get job
NAME COMPLETIONS DURATION AGE
mycronjob-1548669600 0/1 65s 65s
kubectl get pods
NAME READY STATUS RESTARTS AGE
mycronjob-1548669600-mzp2x 1/1 Running 0 65s
80 seconds later first job Completed
startingDeadlineSeconds: 10 prevents second job from starting now ... it is 20 seconds past its planned start time ... so it gets skipped.
kubectl get job
NAME COMPLETIONS DURATION AGE
mycronjob-1548669600 1/1 82s 84s
kubectl get pods
NAME READY STATUS RESTARTS AGE
mycronjob-1548669600-mzp2x 0/1 Completed 0 84s
kubectl get job
NAME COMPLETIONS DURATION AGE
mycronjob-1548669600 1/1 82s 97s
kubectl get pods
NAME READY STATUS RESTARTS AGE
mycronjob-1548669600-mzp2x 0/1 Completed 0 97s
Two minutes later new job starts - exactly on the minute time switchover.
kubectl get job
NAME COMPLETIONS DURATION AGE
mycronjob-1548669600 1/1 82s 118s
kubectl get pods
NAME READY STATUS RESTARTS AGE
mycronjob-1548669600-mzp2x 0/1 Completed 0 119s
kubectl get job
NAME COMPLETIONS DURATION AGE
mycronjob-1548669600 1/1 82s 2m5s
mycronjob-1548669720 0/1 5s 5s
kubectl get pods
NAME READY STATUS RESTARTS AGE
mycronjob-1548669600-mzp2x 0/1 Completed 0 2m6s
mycronjob-1548669720-6dmrh 1/1 Running 0 6s
Output below as expected:
SawCompletedJob every 80 seconds as each sleep 80 completes.
kubectl describe cronjob/mycronjob
Name: mycronjob
Schedule: */1 * * * *
Concurrency Policy: Forbid
Starting Deadline Seconds: 10s
Pod Template:
echo Job Pod is Running ; sleep 80
Events:
Type Reason Age From Message
---- ------ ---- ---- -------
Normal SuccessfulCreate 2m16s cronjob-controller Created job mycronjob-1548669600
Normal SawCompletedJob 46s cronjob-controller Saw completed job: mycronjob-1548669600
Normal SuccessfulCreate 16s cronjob-controller Created job mycronjob-1548669720
Delete ...
kubectl delete -f myCronJob.yaml
cronjob.batch "mycronjob" deleted
Kubernetes is able to handle 3 main types of parallel jobs.
For more information https://kubernetes.io/docs/concepts/workloads/controllers/jobs-run-to-completion/#parallel-jobs
This section deals with a cron job that periodically runs a job with a parallelism of three.
A basic JOB that runs with a parallelism of three will have 1 job running - with three Pods running.
A CRON JOB that runs with a parallelism of three will have 1 job running - with three Pods running ... with the major difference : it runs periodically.
You have to carefully define such cron jobs so that too many jobs are not unintentionally running in parallel too frequently otherwise you have a CPU-overload mass of jobs.
This first example only runs 3 jobs in parallel - each running 5 seconds ( no problems ).
nano myCronJob.yaml
apiVersion: batch/v1beta1
kind: CronJob
metadata:
name: mycronjob
spec:
schedule: "*/1 * * * *"
jobTemplate:
spec:
template:
spec:
containers:
- name: mycron-container
image: alpine
imagePullPolicy: IfNotPresent
command: ['sh', '-c', 'echo Job Pod is Running ; sleep 5']
restartPolicy: OnFailure
terminationGracePeriodSeconds: 0
parallelism: 3
concurrencyPolicy: Allow
Create job.
kubectl create -f myCronJob.yaml
cronjob.batch/mycronjob created
Check progress:
3 pods running in parallel.
kubectl get job
NAME COMPLETIONS DURATION AGE
mycronjob-1548745200 0/1 of 3 3s 3s
kubectl get pods
NAME READY STATUS RESTARTS AGE
mycronjob-1548745200-2gg4s 1/1 Running 0 3s
mycronjob-1548745200-nslj8 1/1 Running 0 3s
mycronjob-1548745200-rhcnf 1/1 Running 0 3s
11 seconds later. Each slept for 5 seconds in parallel. All now completed.
kubectl get job
NAME COMPLETIONS DURATION AGE
mycronjob-1548745200 3/1 of 3 7s 11s
kubectl get pods
NAME READY STATUS RESTARTS AGE
mycronjob-1548745200-2gg4s 0/1 Completed 0 11s
mycronjob-1548745200-nslj8 0/1 Completed 0 11s
mycronjob-1548745200-rhcnf 0/1 Completed 0 11s
A minute later. Second set of 3 Pods completed.
kubectl get job
NAME COMPLETIONS DURATION AGE
mycronjob-1548745200 3/1 of 3 7s 67s
mycronjob-1548745260 3/1 of 3 7s 7s
kubectl get pods
NAME READY STATUS RESTARTS AGE
mycronjob-1548745200-2gg4s 0/1 Completed 0 67s
mycronjob-1548745200-nslj8 0/1 Completed 0 67s
mycronjob-1548745200-rhcnf 0/1 Completed 0 67s
mycronjob-1548745260-bk84s 0/1 Completed 0 7s
mycronjob-1548745260-rpv7h 0/1 Completed 0 7s
mycronjob-1548745260-z87mk 0/1 Completed 0 7s
Two minutes later. Third set of 3 Pods running.
kubectl get job
NAME COMPLETIONS DURATION AGE
mycronjob-1548745200 3/1 of 3 7s 2m5s
mycronjob-1548745260 3/1 of 3 7s 65s
mycronjob-1548745320 0/1 of 3 4s 4s
kubectl get pods
NAME READY STATUS RESTARTS AGE
mycronjob-1548745200-2gg4s 0/1 Completed 0 2m5s
mycronjob-1548745200-nslj8 0/1 Completed 0 2m5s
mycronjob-1548745200-rhcnf 0/1 Completed 0 2m5s
mycronjob-1548745260-bk84s 0/1 Completed 0 65s
mycronjob-1548745260-rpv7h 0/1 Completed 0 65s
mycronjob-1548745260-z87mk 0/1 Completed 0 65s
mycronjob-1548745320-bk2mg 1/1 Running 0 4s
mycronjob-1548745320-fbg9v 1/1 Running 0 4s
mycronjob-1548745320-wpblf 1/1 Running 0 4s
No overlapped running jobs: simple.
kubectl delete -f myCronJob.yaml
cronjob.batch "mycronjob" deleted
This time 6 Pods need to complete for job to be considered complete. completions: 6
nano myCronJob.yaml
apiVersion: batch/v1beta1
kind: CronJob
metadata:
name: mycronjob
spec:
schedule: "*/1 * * * *"
jobTemplate:
spec:
template:
spec:
containers:
- name: mycron-container
image: alpine
imagePullPolicy: IfNotPresent
command: ['sh', '-c', 'echo Job Pod is Running ; sleep 5']
restartPolicy: OnFailure
terminationGracePeriodSeconds: 0
parallelism: 3
completions: 6
concurrencyPolicy: Allow
Create:
kubectl create -f myCronJob.yaml
cronjob.batch/mycronjob created
Monitor: parallelism: 3 ... 3 Pods start simultaneously.
NAME COMPLETIONS DURATION AGE
mycronjob-1548745740 0/6 1s 1s
kubectl get pods
NAME READY STATUS RESTARTS AGE
mycronjob-1548745740-9jb8w 0/1 ContainerCreating 0 1s
mycronjob-1548745740-q6jwn 0/1 ContainerCreating 0 1s
mycronjob-1548745740-w6tmg 0/1 ContainerCreating 0 1s
Seconds later, first set of 3 completed, second set of 3 running.
kubectl get job
NAME COMPLETIONS DURATION AGE
mycronjob-1548745740 3/6 8s 8s
kubectl get pods
NAME READY STATUS RESTARTS AGE
mycronjob-1548745740-4lkg9 1/1 Running 0 2s
mycronjob-1548745740-9jb8w 0/1 Completed 0 8s
mycronjob-1548745740-f5qzk 1/1 Running 0 2s
mycronjob-1548745740-pkfn5 1/1 Running 0 2s
mycronjob-1548745740-q6jwn 0/1 Completed 0 8s
mycronjob-1548745740-w6tmg 0/1 Completed 0 8s
Seconds later ... second set of 3 completed. completions: 6
kubectl get job
NAME COMPLETIONS DURATION AGE
mycronjob-1548745740 6/6 12s 17s
kubectl get pods
NAME READY STATUS RESTARTS AGE
mycronjob-1548745740-4lkg9 0/1 Completed 0 12s
mycronjob-1548745740-9jb8w 0/1 Completed 0 18s
mycronjob-1548745740-f5qzk 0/1 Completed 0 12s
mycronjob-1548745740-pkfn5 0/1 Completed 0 12s
mycronjob-1548745740-q6jwn 0/1 Completed 0 18s
mycronjob-1548745740-w6tmg 0/1 Completed 0 18s
One minute later this cycle repeats.
kubectl get job
NAME COMPLETIONS DURATION AGE
mycronjob-1548745740 6/6 12s 63s
mycronjob-1548745800 0/6 3s 3s
kubectl get pods
NAME READY STATUS RESTARTS AGE
mycronjob-1548745740-4lkg9 0/1 Completed 0 57s
mycronjob-1548745740-9jb8w 0/1 Completed 0 63s
mycronjob-1548745740-f5qzk 0/1 Completed 0 57s
mycronjob-1548745740-pkfn5 0/1 Completed 0 57s
mycronjob-1548745740-q6jwn 0/1 Completed 0 63s
mycronjob-1548745740-w6tmg 0/1 Completed 0 63s
mycronjob-1548745800-4bvgz 1/1 Running 0 3s
mycronjob-1548745800-csfr5 1/1 Running 0 3s
mycronjob-1548745800-qddtw 1/1 Running 0 3s
kubectl get job
NAME COMPLETIONS DURATION AGE
mycronjob-1548745740 6/6 12s 67s
mycronjob-1548745800 3/6 7s 7s
kubectl get pods
NAME READY STATUS RESTARTS AGE
mycronjob-1548745740-4lkg9 0/1 Completed 0 61s
mycronjob-1548745740-9jb8w 0/1 Completed 0 67s
mycronjob-1548745740-f5qzk 0/1 Completed 0 61s
mycronjob-1548745740-pkfn5 0/1 Completed 0 61s
mycronjob-1548745740-q6jwn 0/1 Completed 0 67s
mycronjob-1548745740-w6tmg 0/1 Completed 0 67s
mycronjob-1548745800-4bvgz 0/1 Completed 0 7s
mycronjob-1548745800-4mg4b 1/1 Running 0 1s
mycronjob-1548745800-csfr5 0/1 Completed 0 7s
mycronjob-1548745800-kl295 0/1 ContainerCreating 0 1s
mycronjob-1548745800-mw6d7 0/1 ContainerCreating 0 1s
mycronjob-1548745800-qddtw 0/1 Completed 0 7s
kubectl get job
NAME COMPLETIONS DURATION AGE
mycronjob-1548745740 6/6 12s 75s
mycronjob-1548745800 6/6 12s 15s
kubectl get pods
NAME READY STATUS RESTARTS AGE
mycronjob-1548745740-4lkg9 0/1 Completed 0 69s
mycronjob-1548745740-9jb8w 0/1 Completed 0 75s
mycronjob-1548745740-f5qzk 0/1 Completed 0 69s
mycronjob-1548745740-pkfn5 0/1 Completed 0 69s
mycronjob-1548745740-q6jwn 0/1 Completed 0 75s
mycronjob-1548745740-w6tmg 0/1 Completed 0 75s
mycronjob-1548745800-4bvgz 0/1 Completed 0 15s
mycronjob-1548745800-4mg4b 0/1 Completed 0 9s
mycronjob-1548745800-csfr5 0/1 Completed 0 15s
mycronjob-1548745800-kl295 0/1 Completed 0 9s
mycronjob-1548745800-mw6d7 0/1 Completed 0 9s
mycronjob-1548745800-qddtw 0/1 Completed 0 15s
Demo complete, delete ...
kubectl delete -f myCronJob.yaml
cronjob.batch "mycronjob" deleted
This time 2 parallel Pods, 4 completions. Each Pod sleeps 120 seconds.
Jobs will overlap ... concurrencyPolicy: Allow
nano myCronJob.yaml
apiVersion: batch/v1beta1
kind: CronJob
metadata:
name: mycronjob
spec:
schedule: "*/1 * * * *"
jobTemplate:
spec:
template:
spec:
containers:
- name: mycron-container
image: alpine
imagePullPolicy: IfNotPresent
command: ['sh', '-c', 'echo Job Pod is Running ; sleep 120']
restartPolicy: OnFailure
terminationGracePeriodSeconds: 0
parallelism: 2
completions: 4
concurrencyPolicy: Allow
Create:
kubectl create -f myCronJob.yaml
cronjob.batch/mycronjob created
2 jobs start simultaneously.
kubectl get job
NAME COMPLETIONS DURATION AGE
mycronjob-1548747000 0/4 3s 3s
kubectl get pods
NAME READY STATUS RESTARTS AGE
mycronjob-1548747000-8kddx 1/1 Running 0 3s
mycronjob-1548747000-pv5f7 1/1 Running 0 3s
After 1 minute a second set of 2 Pods starting.
kubectl get job
NAME COMPLETIONS DURATION AGE
mycronjob-1548747000 0/4 65s 65s
mycronjob-1548747060 0/4 5s 5s
kubectl get pods
NAME READY STATUS RESTARTS AGE
mycronjob-1548747000-8kddx 1/1 Running 0 65s
mycronjob-1548747000-pv5f7 1/1 Running 0 65s
mycronjob-1548747060-98gfj 1/1 Running 0 5s
mycronjob-1548747060-ltlp4 1/1 Running 0 5s
Another minute later ... third job started with 2 ContainerCreating Pods
kubectl get job
NAME COMPLETIONS DURATION AGE
mycronjob-1548747000 0/4 2m 2m
mycronjob-1548747060 0/4 60s 60s
.... missed by milliseconds: mycronjob-1548747120 should have been listed here ....
.... its 2 Pods are listed below ...
kubectl get pods
NAME READY STATUS RESTARTS AGE
mycronjob-1548747000-8kddx 1/1 Running 0 2m
mycronjob-1548747000-pv5f7 1/1 Running 0 2m
mycronjob-1548747060-98gfj 1/1 Running 0 60s
mycronjob-1548747060-ltlp4 1/1 Running 0 60s
mycronjob-1548747120-876jx 0/1 ContainerCreating 0 0s
mycronjob-1548747120-vpv8p 0/1 ContainerCreating 0 0s
Several minutes later result in the mess below.
See below ... you have 3 sets of parallel jobs that started 3 independent minutes all running simultaneously now.
So instead of 2 parallel jobs, you now have 8
kubectl get job
NAME COMPLETIONS DURATION AGE
mycronjob-1548747000 2/4 3m45s 3m45s
mycronjob-1548747060 2/4 2m45s 2m45s
mycronjob-1548747120 0/4 105s 105s
mycronjob-1548747180 0/4 45s 45s
kubectl get pods
NAME READY STATUS RESTARTS AGE
mycronjob-1548747000-8kddx 0/1 Completed 0 3m45s
mycronjob-1548747000-cpt97 1/1 Running 0 104s
mycronjob-1548747000-dc8z5 1/1 Running 0 104s
mycronjob-1548747000-pv5f7 0/1 Completed 0 3m45s
mycronjob-1548747060-98gfj 0/1 Completed 0 2m45s
mycronjob-1548747060-jmkld 1/1 Running 0 44s
mycronjob-1548747060-khnng 1/1 Running 0 44s
mycronjob-1548747060-ltlp4 0/1 Completed 0 2m45s
mycronjob-1548747120-876jx 1/1 Running 0 105s
mycronjob-1548747120-vpv8p 1/1 Running 0 105s
mycronjob-1548747180-2kbpf 1/1 Running 0 45s
mycronjob-1548747180-rxgl8 1/1 Running 0 45s
TIP: Do not schedule long running cron jobs to run time-spaced too close apart.
Delete.
kubectl delete -f myCronJob.yaml
cronjob.batch "mycronjob" deleted
concurrencyPolicy: Forbid skips new jobs that try to run while previous job is still busy.
This fixes the problem we had in previous section.
nano myCronJob.yaml
apiVersion: batch/v1beta1
kind: CronJob
metadata:
name: mycronjob
spec:
schedule: "*/1 * * * *"
jobTemplate:
spec:
template:
spec:
containers:
- name: mycron-container
image: alpine
imagePullPolicy: IfNotPresent
command: ['sh', '-c', 'echo Job Pod is Running ; sleep 120']
restartPolicy: OnFailure
terminationGracePeriodSeconds: 0
parallelism: 2
completions: 4
concurrencyPolicy: Forbid
Create.
kubectl create -f myCronJob.yaml
cronjob.batch/mycronjob created
Monitor: 2 Pods started.
kubectl get job
NAME COMPLETIONS DURATION AGE
mycronjob-1548747900 0/4 5s 5s
kubectl get pods
NAME READY STATUS RESTARTS AGE
mycronjob-1548747900-bs69b 1/1 Running 0 5s
mycronjob-1548747900-dmstt 1/1 Running 0 5s
A minute later original Pods still running, new jobs and new Pods DO NOT start.
kubectl get job
NAME COMPLETIONS DURATION AGE
mycronjob-1548747900 0/4 67s 67s
kubectl get pods
NAME READY STATUS RESTARTS AGE
mycronjob-1548747900-bs69b 1/1 Running 0 67s
mycronjob-1548747900-dmstt 1/1 Running 0 67s
3 minutes later. Previous job is complete, new job can run ( no running Pods in the way )
kubectl get job
NAME COMPLETIONS DURATION AGE
mycronjob-1548747900 2/4 3m15s 3m15s
kubectl get pods
NAME READY STATUS RESTARTS AGE
mycronjob-1548747900-bs69b 0/1 Completed 0 3m15s
mycronjob-1548747900-dmstt 0/1 Completed 0 3m15s
mycronjob-1548747900-mg5g2 1/1 Running 0 73s
mycronjob-1548747900-ztlgc 1/1 Running 0 73s
Another minute later 1548747900 series Pods Completed - new series running.
kubectl get job
NAME COMPLETIONS DURATION AGE
mycronjob-1548747900 4/4 4m3s 4m14s
mycronjob-1548748140 0/4 4s 4s
kubectl get pods
NAME READY STATUS RESTARTS AGE
mycronjob-1548747900-bs69b 0/1 Completed 0 4m15s
mycronjob-1548747900-dmstt 0/1 Completed 0 4m15s
mycronjob-1548747900-mg5g2 0/1 Completed 0 2m13s
mycronjob-1548747900-ztlgc 0/1 Completed 0 2m13s
mycronjob-1548748140-49hdp 1/1 Running 0 5s
mycronjob-1548748140-rw56f 1/1 Running 0 5s
concurrencyPolicy: Forbid may be the solution if you have unintentional too many Pods running simultaneously.
Delete.
kubectl delete -f myCronJob.yaml
cronjob.batch "mycronjob" deleted
concurrencyPolicy: Replace
From https://kubernetes.io/docs/tasks/job/automated-tasks-with-cron-jobs/#concurrency-policy
If it is time for a new job run and the previous job run hasn't finished yet, the cron job replaces the currently running job run with a new job run
nano myCronJob.yaml
apiVersion: batch/v1beta1
kind: CronJob
metadata:
name: mycronjob
spec:
schedule: "*/1 * * * *"
jobTemplate:
spec:
template:
spec:
containers:
- name: mycron-container
image: alpine
imagePullPolicy: IfNotPresent
command: ['sh', '-c', 'echo Job Pod is Running ; sleep 120']
restartPolicy: OnFailure
terminationGracePeriodSeconds: 0
parallelism: 2
completions: 4
concurrencyPolicy: Replace
Create cron job.
kubectl create -f myCronJob.yaml
cronjob.batch/mycronjob created
Investigate what happened after 12 minutes.
kubectl describe cronjob/mycronjob
Name: mycronjob
Schedule: */1 * * * *
Concurrency Policy: Replace
Parallelism: 2
Completions: 4
Pod Template:
Command:
echo Job Pod is Running ; sleep 120
Events:
Type Reason Age From Message
---- ------ ---- ---- -------
Normal SuccessfulCreate 12m cronjob-controller Created job mycronjob-1548748620
Normal SuccessfulDelete 11m cronjob-controller Deleted job mycronjob-1548748620
Normal SuccessfulCreate 11m cronjob-controller Created job mycronjob-1548748680
Normal SuccessfulDelete 10m cronjob-controller Deleted job mycronjob-1548748680
Normal SuccessfulCreate 10m cronjob-controller Created job mycronjob-1548748740
Normal SuccessfulDelete 9m27s cronjob-controller Deleted job mycronjob-1548748740
Normal SuccessfulCreate 9m27s cronjob-controller Created job mycronjob-1548748800
Normal SuccessfulDelete 8m27s cronjob-controller Deleted job mycronjob-1548748800
Normal SuccessfulCreate 8m27s cronjob-controller Created job mycronjob-1548748860
Normal SuccessfulDelete 7m27s cronjob-controller Deleted job mycronjob-1548748860
Normal SuccessfulCreate 7m27s cronjob-controller Created job mycronjob-1548748920
Normal SuccessfulDelete 6m26s cronjob-controller Deleted job mycronjob-1548748920
Normal SuccessfulCreate 6m26s cronjob-controller Created job mycronjob-1548748980
Normal SuccessfulDelete 5m26s cronjob-controller Deleted job mycronjob-1548748980
Normal SuccessfulCreate 5m26s cronjob-controller Created job mycronjob-1548749040
Normal SuccessfulDelete 4m26s cronjob-controller Deleted job mycronjob-1548749040
Normal SuccessfulCreate 4m26s cronjob-controller Created job mycronjob-1548749100
Normal SuccessfulDelete 3m26s cronjob-controller Deleted job mycronjob-1548749100
Normal SuccessfulCreate 25s (x4 over 3m26s) cronjob-controller (combined from similar events): Created job mycronjob-1548749340
Normal SuccessfulDelete 25s (x3 over 2m26s) cronjob-controller (combined from similar events): Deleted job mycronjob-1548749280
One new job started every minute.
Previous running job deleted every time a new job starts. This is concurrencyPolicy: Replace in action.
IMPORTANT: NO job completes. NO job ever runs for its full 120 sleep seconds.
Use concurrencyPolicy: Replace if you understand how it works and you need its feature: If it is time for a new job run and the previous job run hasn't finished yet, the cron job replaces the currently running job run with a new job run. ( Replace deletes previous job, Pod and its logs )
IMPORTANT: In this tutorial for this specific case NO job ever completes successfully. It ALWAYS gets replaced. However if you use replace in your environment it will only replace jobs when a previous one exists. That will probably be nearly never in your production environment. Understand replace and its implications. It may be wrong or perfect right for your case.
These commands only show latest running job and its Pods.
kubectl get job
NAME COMPLETIONS DURATION AGE
mycronjob-1548749400 0/4 3s 3s
kubectl get pods
NAME READY STATUS RESTARTS AGE
mycronjob-1548749400-fcrrt 1/1 Running 0 3s
mycronjob-1548749400-n2wbs 1/1 Running 0 3s
kubectl get cronjobs
NAME SCHEDULE SUSPEND ACTIVE LAST SCHEDULE AGE
mycronjob */1 * * * * False 1 22s 13m
Delete cron job.
kubectl delete -f myCronJob.yaml
cronjob.batch "mycronjob" deleted
I am impressed and agree with Kubernetes job and cron job functionality in general.
However when a cron job has problems that need the backofflimit functionality it leaves no trace evidence.
A long running job may sometimes experience intermittent and changing problems.
backofflimit deletes crashed jobs, their Pods and their logs. You are only left with the current running job and Pod. kubectl get cronjob does not even hint that this cronjob has problems.
kubectl describe shows misleading event messages: all seems OK, but only success events shown.
Let's investigate.
From
https://kubernetes.io/docs/concepts/workloads/controllers/jobs-run-to-completion/#pod-backoff-failure-policy
> Pod Backoff failure policy
> There are situations where you want to fail a Job after some amount of retries due to a logical error in configuration etc. To do so, set .spec.backoffLimit to specify the number of retries before considering a Job as failed. The back-off limit is set by default to 6.
> Failed Pods associated with the Job are recreated by the Job controller with an exponential back-off delay (10s, 20s, 40s …) capped at six minutes. The back-off count is reset if no new failed Pods appear before the Job's next status check.
The cron job below should run every minute.
It exits immediately upon start with error exit code 1.
It has a backoffLimit of 2. It must retry twice to run in case of problems.
nano myCronJob.yaml
apiVersion: batch/v1beta1
kind: CronJob
metadata:
name: mycronjob
spec:
schedule: "*/1 * * * *"
jobTemplate:
spec:
template:
spec:
containers:
- name: mycron-container
image: alpine
imagePullPolicy: IfNotPresent
command: ['sh', '-c', 'echo Job Pod is Running ; exit 1']
restartPolicy: OnFailure
terminationGracePeriodSeconds: 0
backoffLimit: 2
Create the cron job.
kubectl create -f myCronJob.yaml
cronjob.batch/mycronjob created
13 seconds later the first Pod tried to start, got exit 1 error.
kubectl get job
NAME COMPLETIONS DURATION AGE
mycronjob-1548752820 0/1 13s 13s
kubectl get pods
NAME READY STATUS RESTARTS AGE
mycronjob-1548752820-7czts 0/1 CrashLoopBackOff 1 13s
However look at kubectl describe events below.
The event Saw completed job: makes it sound as if job completed successfully, but it did not.
There is no indication on exit 1 condition, and no indication of CrashLoopBackOff status.
kubectl describe cronjob/mycronjob
Name: mycronjob
Schedule: */1 * * * *
Pod Template:
Command:
echo Job Pod is Running ; exit 1
Active Jobs: <none>
Events:
Type Reason Age From Message
---- ------ ---- ---- -------
Normal SuccessfulCreate 23s cronjob-controller Created job mycronjob-1548752820
Normal SawCompletedJob 3s cronjob-controller Saw completed job: mycronjob-1548752820
A minute later. Nothing in output below hints at a problem with the cron job.
kubectl get cronjobs
NAME SCHEDULE SUSPEND ACTIVE LAST SCHEDULE AGE
mycronjob */1 * * * * False 0 55s 68s
The output below also does not hint at a problem.
It seems first job is running slow, 77 seconds and still busy ... zero COMPLETIONS.
However the Pod of first job crashed in the first second.
kubectl get job
NAME COMPLETIONS DURATION AGE
mycronjob-1548752820 0/1 77s 77s
mycronjob-1548752880 0/1 17s 17s
If we look at kubectl describe we see:
kubectl describe cronjob/mycronjob
Name: mycronjob
Events:
Type Reason Age From Message
---- ------ ---- ---- -------
Normal SuccessfulCreate 2m26s cronjob-controller Created job mycronjob-1548752820
Normal SawCompletedJob 2m6s cronjob-controller Saw completed job: mycronjob-1548752820
Normal SuccessfulCreate 86s cronjob-controller Created job mycronjob-1548752880
Normal SawCompletedJob 66s cronjob-controller Saw completed job: mycronjob-1548752880
Normal SuccessfulDelete 66s cronjob-controller Deleted job mycronjob-1548752820
Normal SuccessfulCreate 26s cronjob-controller Created job mycronjob-1548752940
Normal SawCompletedJob 6s cronjob-controller Saw completed job: mycronjob-1548752940
Normal SuccessfulDelete 6s cronjob-controller Deleted job mycronjob-1548752880
Last line above: first failed cron job is now deleted ... logs gone.
describe output above SEEMS to show all is well, but it is not: no indication of CrashLoopBackOff status.
If you have an hourly cron job that has such problems you are left with little historic paper trail evidence of what happened.
I do not enjoy troubleshooting cron jobs that use backoffLimit and this behaviors.
Tip: write such cron job log information to a persistent volume and use that as your primary research source. Everything in one place and it is persistent. Plus you are in control of what information to write to the logs.
kubectl delete -f myCronJob.yaml
cronjob.batch "mycronjob" deleted
These spec feature fields enable useful functionality configurations:
Standalone each field is easy to understand. Combining all these YAML spec fields lead to complex interactions and reactions, especially when combined with unexpected long running cron jobs.
Design your own simple tests to learn these features. Then design some complex tests where you test and learn their interactions.
The moment you are running 2 or more different cron jobs simultaneously in 2 or more terminal windows you will know ... you are becoming an expert.
2,599 posts | 762 followers
FollowAlibaba Developer - April 1, 2020
Alibaba Clouder - July 2, 2019
Alibaba Clouder - December 17, 2019
Alibaba Cloud Native - October 18, 2023
Alibaba Clouder - July 2, 2019
Alibaba Cloud Native Community - September 18, 2023
2,599 posts | 762 followers
FollowAlibaba Cloud Container Service for Kubernetes is a fully managed cloud container management service that supports native Kubernetes and integrates with other Alibaba Cloud products.
Learn MoreProvides a control plane to allow users to manage Kubernetes clusters that run based on different infrastructure resources
Learn MoreA secure image hosting platform providing containerized image lifecycle management
Learn MoreAn agile and secure serverless container instance service.
Learn MoreMore Posts by Alibaba Clouder