Kubernetes CronJobs - Part 2: Parallelism

By Alwyn Botha, Alibaba Cloud Tech Share Author. Tech Share is Alibaba Cloud's incentive program to encourage the sharing of technical knowledge and best practices within the cloud community.

1) Suspend: True

You can suspend the overall running of a cron job by using suspend .

Below is the spec for a normal running cron job. ( We need a basic running cron job to suspend ).

nano myCronJob.yaml

apiVersion: batch/v1beta1
kind: CronJob
metadata:
  name: mycronjob
spec:
  schedule: "*/1 * * * *"
  jobTemplate:
    spec:
      template:
        spec:
          containers:
          - name: mycron-container
            image: alpine
            imagePullPolicy: IfNotPresent
            
            command: ['sh', '-c', 'echo Job Pod is Running ; sleep 5']
    
          restartPolicy: OnFailure
          terminationGracePeriodSeconds: 0
  
  concurrencyPolicy: Replace

Start the cron job.

kubectl create -f myCronJob.yaml 
   
cronjob.batch/mycronjob created

get cronjob output

kubectl get cronjob

NAME        SCHEDULE      SUSPEND   ACTIVE   LAST SCHEDULE   AGE
mycronjob   */1 * * * *   False     0        47s             5m7s

Now SUSPEND is relevant.

All along this tutorial this was always False: job is not suspended.

Even right now we have a running cron job.

Add only the suspend: true line to myCronJob.yaml as shown below.

concurrencyPolicy ONLY shown so that your indentation is correct.

  concurrencyPolicy: Replace
  suspend: true

You use kubectl replace to replace a running Kubernetes object with a new one.

In this specific case we replace our running cron job with one that specifies : suspend: true

kubectl replace -f myCronJob.yaml

cronjob.batch/mycronjob replaced

Investigate get cronjob output again:

kubectl get cronjob
NAME        SCHEDULE      SUSPEND   ACTIVE   LAST SCHEDULE   AGE
mycronjob   */1 * * * *   True      0        62s             5m22s

This time it shows SUSPEND True

6 minutes later it will show LAST SCHEDULE job was 6 minutes ago.

This is as expected since cron job is now SUSPENDED.

kubectl get cronjob

NAME        SCHEDULE      SUSPEND   ACTIVE   LAST SCHEDULE   AGE
mycronjob   */1 * * * *   True     0        6m23s           10m

To get cron job running again we remove the suspend: true line from the spec file.

concurrencyPolicy should be the last line.

  concurrencyPolicy: Replace

Replace running cron job with the latest spec from myCronJob.yaml

kubectl replace -f myCronJob.yaml
cronjob.batch/mycronjob replaced

Verify SUSPEND is lifted.

kubectl get cronjob
NAME        SCHEDULE      SUSPEND   ACTIVE   LAST SCHEDULE   AGE
mycronjob   */1 * * * *   False     0        6m23s           10m

Yes, SUSPEND is False.

47 seconds later ... we see LAST SCHEDULE showing jobs get scheduled again.

kubectl get cronjob
NAME        SCHEDULE      SUSPEND   ACTIVE   LAST SCHEDULE   AGE
mycronjob   */1 * * * *   False     0        47s             11m

Investigate detail of describe cronjob/mycronjob

kubectl describe cronjob/mycronjob

Events:
  Type    Reason            Age    From                Message
  ----    ------            ----   ----                -------
  Normal  SuccessfulCreate  11m    cronjob-controller  Created job mycronjob-1548668040
  Normal  SawCompletedJob   11m    cronjob-controller  Saw completed job: mycronjob-1548668040
  Normal  SuccessfulCreate  10m    cronjob-controller  Created job mycronjob-1548668100
  Normal  SawCompletedJob   10m    cronjob-controller  Saw completed job: mycronjob-1548668100
  Normal  SuccessfulCreate  9m13s  cronjob-controller  Created job mycronjob-1548668160
  Normal  SawCompletedJob   9m3s   cronjob-controller  Saw completed job: mycronjob-1548668160
  Normal  SuccessfulCreate  8m13s  cronjob-controller  Created job mycronjob-1548668220
  Normal  SawCompletedJob   8m3s   cronjob-controller  Saw completed job: mycronjob-1548668220
  Normal  SuccessfulDelete  8m3s   cronjob-controller  Deleted job mycronjob-1548668040
  Normal  SuccessfulCreate  7m13s  cronjob-controller  Created job mycronjob-1548668280
  Normal  SawCompletedJob   7m3s   cronjob-controller  Saw completed job: mycronjob-1548668280
  Normal  SuccessfulDelete  7m2s   cronjob-controller  Deleted job mycronjob-1548668100
  
  Normal  SuccessfulCreate  52s    cronjob-controller  Created job mycronjob-1548668640
  Normal  SawCompletedJob   42s    cronjob-controller  Saw completed job: mycronjob-1548668640
  Normal  SuccessfulDelete  42s    cronjob-controller  Deleted job mycronjob-1548668160
  Normal  SuccessfulCreate  21s    cronjob-controller  Created job mycronjob-1548668700
  Normal  SawCompletedJob   11s    cronjob-controller  Saw completed job: mycronjob-1548668700
  Normal  SuccessfulDelete  11s    cronjob-controller  Deleted job mycronjob-1548668220

We see a clear gap during which no jobs for this cron job ran.

Unfortunately Kubernetes does not add an event line specifying :

... cron job suspended here

and

... cron job UNsuspended here.

We have to surmise that. It could have been Kubernetes was halted, server rebooted, or something else. Having detail that stated ... a user-invoked suspend happened here ... would have made it clear what happened.

Demo done, delete.

kubectl delete -f myCronJob.yaml 
   
cronjob.batch "mycronjob" deleted

2) startingDeadlineSeconds

Detail Reference - https://kubernetes.io/docs/tasks/job/automated-tasks-with-cron-jobs/#starting-deadline

If this field is not specified, the jobs have no deadline.

Summary: if a job misses its scheduled time by startingDeadlineSeconds it gets skipped.

The next scheduled time it will attempt to run again.

Below we have a cron job that should run every minute.

The work of this cron job is to sleep for 80 seconds.

We have concurrencyPolicy: Forbid specified. Two or more jobs may not run simultaneously.

startingDeadlineSeconds: 10 means it must start within 10 seconds each minute.

The Pod sleeps for 80 seconds means it will still be running a minute later. One minute later the next job cannot start ( concurrencyPolicy: Forbid ) because previous job still has 20 seconds of running time left. This second job will be skipped. This is what we attempt to observe below.

nano myCronJob.yaml

apiVersion: batch/v1beta1
kind: CronJob
metadata:
  name: mycronjob
spec:
  schedule: "*/1 * * * *"
  jobTemplate:
    spec:
      template:
        spec:
          containers:
          - name: mycron-container
            image: alpine
            imagePullPolicy: IfNotPresent
            
            command: ['sh', '-c', 'echo Job Pod is Running ; sleep 80']
    
          restartPolicy: OnFailure
          terminationGracePeriodSeconds: 0
  
  concurrencyPolicy: Forbid
  startingDeadlineSeconds: 10

Create cron job.

kubectl create -f myCronJob.yaml 
   
cronjob.batch/mycronjob created

Investigate status a minute later:

kubectl get job
NAME                   COMPLETIONS   DURATION   AGE
mycronjob-1548669600   0/1           57s        57s

kubectl get pods
NAME                         READY   STATUS    RESTARTS   AGE
mycronjob-1548669600-mzp2x   1/1     Running   0          58s

Previous job continues to run. ( No new job started: it should have run 15 seconds in this output. ) New job skipped as expected.

kubectl get job
NAME                   COMPLETIONS   DURATION   AGE
mycronjob-1548669600   0/1           65s        65s

kubectl get pods
NAME                         READY   STATUS    RESTARTS   AGE
mycronjob-1548669600-mzp2x   1/1     Running   0          65s

80 seconds later first job Completed

startingDeadlineSeconds: 10 prevents second job from starting now ... it is 20 seconds past its planned start time ... so it gets skipped.

kubectl get job
NAME                   COMPLETIONS   DURATION   AGE
mycronjob-1548669600   1/1           82s        84s

kubectl get pods
NAME                         READY   STATUS      RESTARTS   AGE
mycronjob-1548669600-mzp2x   0/1     Completed   0          84s

kubectl get job
NAME                   COMPLETIONS   DURATION   AGE
mycronjob-1548669600   1/1           82s        97s

kubectl get pods
NAME                         READY   STATUS      RESTARTS   AGE
mycronjob-1548669600-mzp2x   0/1     Completed   0          97s

Two minutes later new job starts - exactly on the minute time switchover.

kubectl get job
NAME                   COMPLETIONS   DURATION   AGE
mycronjob-1548669600   1/1           82s        118s

kubectl get pods
NAME                         READY   STATUS      RESTARTS   AGE
mycronjob-1548669600-mzp2x   0/1     Completed   0          119s

kubectl get job
NAME                   COMPLETIONS   DURATION   AGE
mycronjob-1548669600   1/1           82s        2m5s
mycronjob-1548669720   0/1           5s         5s

kubectl get pods
NAME                         READY   STATUS      RESTARTS   AGE
mycronjob-1548669600-mzp2x   0/1     Completed   0          2m6s
mycronjob-1548669720-6dmrh   1/1     Running     0          6s

Output below as expected:

new job started every 2 minutes since every other minute there is still the previous job busy completing its last 20 seconds of sleep time.

SawCompletedJob every 80 seconds as each sleep 80 completes.

kubectl describe cronjob/mycronjob

Name:                       mycronjob
Schedule:                   */1 * * * *
Concurrency Policy:         Forbid
Starting Deadline Seconds:  10s
Pod Template:
      echo Job Pod is Running ; sleep 80
Events:
  Type    Reason            Age    From                Message
  ----    ------            ----   ----                -------
  Normal  SuccessfulCreate  2m16s  cronjob-controller  Created job mycronjob-1548669600
  Normal  SawCompletedJob   46s    cronjob-controller  Saw completed job: mycronjob-1548669600
  Normal  SuccessfulCreate  16s    cronjob-controller  Created job mycronjob-1548669720

Delete ...

kubectl delete -f myCronJob.yaml 
   
cronjob.batch "mycronjob" deleted

3) Parallelism: 3 ... concurrencyPolicy: Allow ... sleep 5

Kubernetes is able to handle 3 main types of parallel jobs.

For more information https://kubernetes.io/docs/concepts/workloads/controllers/jobs-run-to-completion/#parallel-jobs

This section deals with a cron job that periodically runs a job with a parallelism of three.

A basic JOB that runs with a parallelism of three will have 1 job running - with three Pods running.

A CRON JOB that runs with a parallelism of three will have 1 job running - with three Pods running ... with the major difference : it runs periodically.

You have to carefully define such cron jobs so that too many jobs are not unintentionally running in parallel too frequently otherwise you have a CPU-overload mass of jobs.

This first example only runs 3 jobs in parallel - each running 5 seconds ( no problems ).

nano myCronJob.yaml

apiVersion: batch/v1beta1
kind: CronJob
metadata:
  name: mycronjob
spec:
  schedule: "*/1 * * * *"
  jobTemplate:
    spec:
      template:
        spec:
          containers:
          - name: mycron-container
            image: alpine
            imagePullPolicy: IfNotPresent
            
            command: ['sh', '-c', 'echo Job Pod is Running ; sleep 5']
    
          restartPolicy: OnFailure
          terminationGracePeriodSeconds: 0

      parallelism: 3
  
  concurrencyPolicy: Allow

Create job.

kubectl create -f myCronJob.yaml 
   
cronjob.batch/mycronjob created

Check progress:

3 pods running in parallel.

kubectl get job
NAME                   COMPLETIONS   DURATION   AGE
mycronjob-1548745200   0/1 of 3      3s         3s

kubectl get pods
NAME                         READY   STATUS    RESTARTS   AGE
mycronjob-1548745200-2gg4s   1/1     Running   0          3s
mycronjob-1548745200-nslj8   1/1     Running   0          3s
mycronjob-1548745200-rhcnf   1/1     Running   0          3s

11 seconds later. Each slept for 5 seconds in parallel. All now completed.

kubectl get job
NAME                   COMPLETIONS   DURATION   AGE
mycronjob-1548745200   3/1 of 3      7s         11s

kubectl get pods
NAME                         READY   STATUS      RESTARTS   AGE
mycronjob-1548745200-2gg4s   0/1     Completed   0          11s
mycronjob-1548745200-nslj8   0/1     Completed   0          11s
mycronjob-1548745200-rhcnf   0/1     Completed   0          11s

A minute later. Second set of 3 Pods completed.

kubectl get job
NAME                   COMPLETIONS   DURATION   AGE
mycronjob-1548745200   3/1 of 3      7s         67s
mycronjob-1548745260   3/1 of 3      7s         7s

kubectl get pods
NAME                         READY   STATUS      RESTARTS   AGE
mycronjob-1548745200-2gg4s   0/1     Completed   0          67s
mycronjob-1548745200-nslj8   0/1     Completed   0          67s
mycronjob-1548745200-rhcnf   0/1     Completed   0          67s

mycronjob-1548745260-bk84s   0/1     Completed   0          7s
mycronjob-1548745260-rpv7h   0/1     Completed   0          7s
mycronjob-1548745260-z87mk   0/1     Completed   0          7s

Two minutes later. Third set of 3 Pods running.

kubectl get job
NAME                   COMPLETIONS   DURATION   AGE
mycronjob-1548745200   3/1 of 3      7s         2m5s
mycronjob-1548745260   3/1 of 3      7s         65s
mycronjob-1548745320   0/1 of 3      4s         4s

kubectl get pods
NAME                         READY   STATUS      RESTARTS   AGE
mycronjob-1548745200-2gg4s   0/1     Completed   0          2m5s
mycronjob-1548745200-nslj8   0/1     Completed   0          2m5s
mycronjob-1548745200-rhcnf   0/1     Completed   0          2m5s

mycronjob-1548745260-bk84s   0/1     Completed   0          65s
mycronjob-1548745260-rpv7h   0/1     Completed   0          65s
mycronjob-1548745260-z87mk   0/1     Completed   0          65s

mycronjob-1548745320-bk2mg   1/1     Running     0          4s
mycronjob-1548745320-fbg9v   1/1     Running     0          4s
mycronjob-1548745320-wpblf   1/1     Running     0          4s

No overlapped running jobs: simple.

kubectl delete -f myCronJob.yaml 
   
cronjob.batch "mycronjob" deleted

4) Parallelism: 3 ... completions: 6 ... concurrencyPolicy: Allow ... sleep 5

This time 6 Pods need to complete for job to be considered complete. completions: 6

nano myCronJob.yaml

apiVersion: batch/v1beta1
kind: CronJob
metadata:
  name: mycronjob
spec:
  schedule: "*/1 * * * *"
  jobTemplate:
    spec:
      template:
        spec:
          containers:
          - name: mycron-container
            image: alpine
            imagePullPolicy: IfNotPresent
            
            command: ['sh', '-c', 'echo Job Pod is Running ; sleep 5']
    
          restartPolicy: OnFailure
          terminationGracePeriodSeconds: 0

      parallelism: 3        
      completions: 6
  
  concurrencyPolicy: Allow

Create:

kubectl create -f myCronJob.yaml 
   
cronjob.batch/mycronjob created

Monitor: parallelism: 3 ... 3 Pods start simultaneously.

NAME                   COMPLETIONS   DURATION   AGE
mycronjob-1548745740   0/6           1s         1s

kubectl get pods
NAME                         READY   STATUS              RESTARTS   AGE
mycronjob-1548745740-9jb8w   0/1     ContainerCreating   0          1s
mycronjob-1548745740-q6jwn   0/1     ContainerCreating   0          1s
mycronjob-1548745740-w6tmg   0/1     ContainerCreating   0          1s

Seconds later, first set of 3 completed, second set of 3 running.

kubectl get job
NAME                   COMPLETIONS   DURATION   AGE
mycronjob-1548745740   3/6           8s         8s

kubectl get pods
NAME                         READY   STATUS      RESTARTS   AGE
mycronjob-1548745740-4lkg9   1/1     Running     0          2s
mycronjob-1548745740-9jb8w   0/1     Completed   0          8s
mycronjob-1548745740-f5qzk   1/1     Running     0          2s
mycronjob-1548745740-pkfn5   1/1     Running     0          2s
mycronjob-1548745740-q6jwn   0/1     Completed   0          8s
mycronjob-1548745740-w6tmg   0/1     Completed   0          8s

Seconds later ... second set of 3 completed. completions: 6

kubectl get job
NAME                   COMPLETIONS   DURATION   AGE
mycronjob-1548745740   6/6           12s        17s

kubectl get pods
NAME                         READY   STATUS      RESTARTS   AGE
mycronjob-1548745740-4lkg9   0/1     Completed   0          12s
mycronjob-1548745740-9jb8w   0/1     Completed   0          18s
mycronjob-1548745740-f5qzk   0/1     Completed   0          12s
mycronjob-1548745740-pkfn5   0/1     Completed   0          12s
mycronjob-1548745740-q6jwn   0/1     Completed   0          18s
mycronjob-1548745740-w6tmg   0/1     Completed   0          18s

One minute later this cycle repeats.

kubectl get job
NAME                   COMPLETIONS   DURATION   AGE
mycronjob-1548745740   6/6           12s        63s
mycronjob-1548745800   0/6           3s         3s

kubectl get pods
NAME                         READY   STATUS      RESTARTS   AGE
mycronjob-1548745740-4lkg9   0/1     Completed   0          57s
mycronjob-1548745740-9jb8w   0/1     Completed   0          63s
mycronjob-1548745740-f5qzk   0/1     Completed   0          57s
mycronjob-1548745740-pkfn5   0/1     Completed   0          57s
mycronjob-1548745740-q6jwn   0/1     Completed   0          63s
mycronjob-1548745740-w6tmg   0/1     Completed   0          63s

mycronjob-1548745800-4bvgz   1/1     Running     0          3s
mycronjob-1548745800-csfr5   1/1     Running     0          3s
mycronjob-1548745800-qddtw   1/1     Running     0          3s

kubectl get job
NAME                   COMPLETIONS   DURATION   AGE
mycronjob-1548745740   6/6           12s        67s
mycronjob-1548745800   3/6           7s         7s

kubectl get pods
NAME                         READY   STATUS              RESTARTS   AGE
mycronjob-1548745740-4lkg9   0/1     Completed           0          61s
mycronjob-1548745740-9jb8w   0/1     Completed           0          67s
mycronjob-1548745740-f5qzk   0/1     Completed           0          61s
mycronjob-1548745740-pkfn5   0/1     Completed           0          61s
mycronjob-1548745740-q6jwn   0/1     Completed           0          67s
mycronjob-1548745740-w6tmg   0/1     Completed           0          67s

mycronjob-1548745800-4bvgz   0/1     Completed           0          7s
mycronjob-1548745800-4mg4b   1/1     Running             0          1s
mycronjob-1548745800-csfr5   0/1     Completed           0          7s
mycronjob-1548745800-kl295   0/1     ContainerCreating   0          1s
mycronjob-1548745800-mw6d7   0/1     ContainerCreating   0          1s
mycronjob-1548745800-qddtw   0/1     Completed           0          7s

kubectl get job
NAME                   COMPLETIONS   DURATION   AGE
mycronjob-1548745740   6/6           12s        75s
mycronjob-1548745800   6/6           12s        15s


kubectl get pods
NAME                         READY   STATUS      RESTARTS   AGE
mycronjob-1548745740-4lkg9   0/1     Completed   0          69s
mycronjob-1548745740-9jb8w   0/1     Completed   0          75s
mycronjob-1548745740-f5qzk   0/1     Completed   0          69s
mycronjob-1548745740-pkfn5   0/1     Completed   0          69s
mycronjob-1548745740-q6jwn   0/1     Completed   0          75s
mycronjob-1548745740-w6tmg   0/1     Completed   0          75s

mycronjob-1548745800-4bvgz   0/1     Completed   0          15s
mycronjob-1548745800-4mg4b   0/1     Completed   0          9s
mycronjob-1548745800-csfr5   0/1     Completed   0          15s
mycronjob-1548745800-kl295   0/1     Completed   0          9s
mycronjob-1548745800-mw6d7   0/1     Completed   0          9s
mycronjob-1548745800-qddtw   0/1     Completed   0          15s

Demo complete, delete ...

kubectl delete -f myCronJob.yaml 
   
cronjob.batch "mycronjob" deleted

5) Parallelism: 3 ... completions: 6 ... concurrencyPolicy: Allow ... sleep 120

This time 2 parallel Pods, 4 completions. Each Pod sleeps 120 seconds.

Jobs will overlap ... concurrencyPolicy: Allow

nano myCronJob.yaml

apiVersion: batch/v1beta1
kind: CronJob
metadata:
  name: mycronjob
spec:
  schedule: "*/1 * * * *"
  jobTemplate:
    spec:
      template:
        spec:
          containers:
          - name: mycron-container
            image: alpine
            imagePullPolicy: IfNotPresent
            
            command: ['sh', '-c', 'echo Job Pod is Running ; sleep 120']
    
          restartPolicy: OnFailure
          terminationGracePeriodSeconds: 0

      parallelism: 2        
      completions: 4
  
  concurrencyPolicy: Allow

Create:

kubectl create -f myCronJob.yaml 
   
cronjob.batch/mycronjob created

2 jobs start simultaneously.

kubectl get job
NAME                   COMPLETIONS   DURATION   AGE
mycronjob-1548747000   0/4           3s         3s

kubectl get pods
NAME                         READY   STATUS    RESTARTS   AGE
mycronjob-1548747000-8kddx   1/1     Running   0          3s
mycronjob-1548747000-pv5f7   1/1     Running   0          3s

After 1 minute a second set of 2 Pods starting.

kubectl get job
NAME                   COMPLETIONS   DURATION   AGE
mycronjob-1548747000   0/4           65s        65s
mycronjob-1548747060   0/4           5s         5s

kubectl get pods
NAME                         READY   STATUS    RESTARTS   AGE
mycronjob-1548747000-8kddx   1/1     Running   0          65s
mycronjob-1548747000-pv5f7   1/1     Running   0          65s

mycronjob-1548747060-98gfj   1/1     Running   0          5s
mycronjob-1548747060-ltlp4   1/1     Running   0          5s

Another minute later ... third job started with 2 ContainerCreating Pods

kubectl get job
NAME                   COMPLETIONS   DURATION   AGE
mycronjob-1548747000   0/4           2m         2m
mycronjob-1548747060   0/4           60s        60s
.... missed by milliseconds: mycronjob-1548747120  should have been listed here ....
.... its 2 Pods are listed below ...

kubectl get pods
NAME                         READY   STATUS              RESTARTS   AGE
mycronjob-1548747000-8kddx   1/1     Running             0          2m
mycronjob-1548747000-pv5f7   1/1     Running             0          2m

mycronjob-1548747060-98gfj   1/1     Running             0          60s
mycronjob-1548747060-ltlp4   1/1     Running             0          60s

mycronjob-1548747120-876jx   0/1     ContainerCreating   0          0s
mycronjob-1548747120-vpv8p   0/1     ContainerCreating   0          0s

Several minutes later result in the mess below.

See below ... you have 3 sets of parallel jobs that started 3 independent minutes all running simultaneously now.

So instead of 2 parallel jobs, you now have 8

kubectl get job
NAME                   COMPLETIONS   DURATION   AGE
mycronjob-1548747000   2/4           3m45s      3m45s
mycronjob-1548747060   2/4           2m45s      2m45s
mycronjob-1548747120   0/4           105s       105s
mycronjob-1548747180   0/4           45s        45s

kubectl get pods
NAME                         READY   STATUS      RESTARTS   AGE
mycronjob-1548747000-8kddx   0/1     Completed   0          3m45s
mycronjob-1548747000-cpt97   1/1     Running     0          104s
mycronjob-1548747000-dc8z5   1/1     Running     0          104s
mycronjob-1548747000-pv5f7   0/1     Completed   0          3m45s

mycronjob-1548747060-98gfj   0/1     Completed   0          2m45s
mycronjob-1548747060-jmkld   1/1     Running     0          44s
mycronjob-1548747060-khnng   1/1     Running     0          44s
mycronjob-1548747060-ltlp4   0/1     Completed   0          2m45s

mycronjob-1548747120-876jx   1/1     Running     0          105s
mycronjob-1548747120-vpv8p   1/1     Running     0          105s
mycronjob-1548747180-2kbpf   1/1     Running     0          45s
mycronjob-1548747180-rxgl8   1/1     Running     0          45s

TIP: Do not schedule long running cron jobs to run time-spaced too close apart.

Delete.

kubectl delete -f myCronJob.yaml 
   
cronjob.batch "mycronjob" deleted

6) Parallelism: 3 ... completions: 6 ... concurrencyPolicy: Forbid ... sleep 120

concurrencyPolicy: Forbid skips new jobs that try to run while previous job is still busy.

This fixes the problem we had in previous section.

nano myCronJob.yaml

apiVersion: batch/v1beta1
kind: CronJob
metadata:
  name: mycronjob
spec:
  schedule: "*/1 * * * *"
  jobTemplate:
    spec:
      template:
        spec:
          containers:
          - name: mycron-container
            image: alpine
            imagePullPolicy: IfNotPresent
            
            command: ['sh', '-c', 'echo Job Pod is Running ; sleep 120']
    
          restartPolicy: OnFailure
          terminationGracePeriodSeconds: 0

      parallelism: 2        
      completions: 4
  
  concurrencyPolicy: Forbid

Create.

kubectl create -f myCronJob.yaml 
   
cronjob.batch/mycronjob created

Monitor: 2 Pods started.

kubectl get job
NAME                   COMPLETIONS   DURATION   AGE
mycronjob-1548747900   0/4           5s         5s

kubectl get pods
NAME                         READY   STATUS    RESTARTS   AGE
mycronjob-1548747900-bs69b   1/1     Running   0          5s
mycronjob-1548747900-dmstt   1/1     Running   0          5s

A minute later original Pods still running, new jobs and new Pods DO NOT start.

kubectl get job
NAME                   COMPLETIONS   DURATION   AGE
mycronjob-1548747900   0/4           67s        67s

kubectl get pods
NAME                         READY   STATUS    RESTARTS   AGE
mycronjob-1548747900-bs69b   1/1     Running   0          67s
mycronjob-1548747900-dmstt   1/1     Running   0          67s

3 minutes later. Previous job is complete, new job can run ( no running Pods in the way )

kubectl get job
NAME                   COMPLETIONS   DURATION   AGE
mycronjob-1548747900   2/4           3m15s      3m15s

kubectl get pods
NAME                         READY   STATUS      RESTARTS   AGE
mycronjob-1548747900-bs69b   0/1     Completed   0          3m15s
mycronjob-1548747900-dmstt   0/1     Completed   0          3m15s
mycronjob-1548747900-mg5g2   1/1     Running     0          73s
mycronjob-1548747900-ztlgc   1/1     Running     0          73s

Another minute later 1548747900 series Pods Completed - new series running.

kubectl get job
NAME                   COMPLETIONS   DURATION   AGE
mycronjob-1548747900   4/4           4m3s       4m14s
mycronjob-1548748140   0/4           4s         4s

kubectl get pods
NAME                         READY   STATUS      RESTARTS   AGE
mycronjob-1548747900-bs69b   0/1     Completed   0          4m15s
mycronjob-1548747900-dmstt   0/1     Completed   0          4m15s
mycronjob-1548747900-mg5g2   0/1     Completed   0          2m13s
mycronjob-1548747900-ztlgc   0/1     Completed   0          2m13s

mycronjob-1548748140-49hdp   1/1     Running     0          5s
mycronjob-1548748140-rw56f   1/1     Running     0          5s

concurrencyPolicy: Forbid may be the solution if you have unintentional too many Pods running simultaneously.

Delete.

kubectl delete -f myCronJob.yaml 
   
cronjob.batch "mycronjob" deleted

7) Parallelism: 3 ... completions: 6 ... concurrencyPolicy: Replace ... sleep 120

concurrencyPolicy: Replace

From https://kubernetes.io/docs/tasks/job/automated-tasks-with-cron-jobs/#concurrency-policy

If it is time for a new job run and the previous job run hasn't finished yet, the cron job replaces the currently running job run with a new job run

nano myCronJob.yaml

apiVersion: batch/v1beta1
kind: CronJob
metadata:
  name: mycronjob
spec:
  schedule: "*/1 * * * *"
  jobTemplate:
    spec:
      template:
        spec:
          containers:
          - name: mycron-container
            image: alpine
            imagePullPolicy: IfNotPresent
            
            command: ['sh', '-c', 'echo Job Pod is Running ; sleep 120']
    
          restartPolicy: OnFailure
          terminationGracePeriodSeconds: 0

      parallelism: 2        
      completions: 4
  
  concurrencyPolicy: Replace

Create cron job.

kubectl create -f myCronJob.yaml 
   
cronjob.batch/mycronjob created

Investigate what happened after 12 minutes.

kubectl describe cronjob/mycronjob

Name:                       mycronjob
Schedule:                   */1 * * * *
Concurrency Policy:         Replace
Parallelism:                2
Completions:                4
Pod Template:
    Command:
      echo Job Pod is Running ; sleep 120
Events:
  Type    Reason            Age                  From                Message
  ----    ------            ----                 ----                -------
  Normal  SuccessfulCreate  12m                  cronjob-controller  Created job mycronjob-1548748620
  Normal  SuccessfulDelete  11m                  cronjob-controller  Deleted job mycronjob-1548748620
  
  Normal  SuccessfulCreate  11m                  cronjob-controller  Created job mycronjob-1548748680
  Normal  SuccessfulDelete  10m                  cronjob-controller  Deleted job mycronjob-1548748680
  
  Normal  SuccessfulCreate  10m                  cronjob-controller  Created job mycronjob-1548748740
  Normal  SuccessfulDelete  9m27s                cronjob-controller  Deleted job mycronjob-1548748740
  
  Normal  SuccessfulCreate  9m27s                cronjob-controller  Created job mycronjob-1548748800
  Normal  SuccessfulDelete  8m27s                cronjob-controller  Deleted job mycronjob-1548748800
  Normal  SuccessfulCreate  8m27s                cronjob-controller  Created job mycronjob-1548748860
  Normal  SuccessfulDelete  7m27s                cronjob-controller  Deleted job mycronjob-1548748860
  Normal  SuccessfulCreate  7m27s                cronjob-controller  Created job mycronjob-1548748920
  Normal  SuccessfulDelete  6m26s                cronjob-controller  Deleted job mycronjob-1548748920
  Normal  SuccessfulCreate  6m26s                cronjob-controller  Created job mycronjob-1548748980
  Normal  SuccessfulDelete  5m26s                cronjob-controller  Deleted job mycronjob-1548748980
  Normal  SuccessfulCreate  5m26s                cronjob-controller  Created job mycronjob-1548749040
  Normal  SuccessfulDelete  4m26s                cronjob-controller  Deleted job mycronjob-1548749040
  Normal  SuccessfulCreate  4m26s                cronjob-controller  Created job mycronjob-1548749100
  Normal  SuccessfulDelete  3m26s                cronjob-controller  Deleted job mycronjob-1548749100
  Normal  SuccessfulCreate  25s (x4 over 3m26s)  cronjob-controller  (combined from similar events): Created job mycronjob-1548749340
  Normal  SuccessfulDelete  25s (x3 over 2m26s)  cronjob-controller  (combined from similar events): Deleted job mycronjob-1548749280

One new job started every minute.

Previous running job deleted every time a new job starts. This is concurrencyPolicy: Replace in action.

IMPORTANT: NO job completes. NO job ever runs for its full 120 sleep seconds.

Use concurrencyPolicy: Replace if you understand how it works and you need its feature: If it is time for a new job run and the previous job run hasn't finished yet, the cron job replaces the currently running job run with a new job run. ( Replace deletes previous job, Pod and its logs )

IMPORTANT: In this tutorial for this specific case NO job ever completes successfully. It ALWAYS gets replaced. However if you use replace in your environment it will only replace jobs when a previous one exists. That will probably be nearly never in your production environment. Understand replace and its implications. It may be wrong or perfect right for your case.

These commands only show latest running job and its Pods.

kubectl get job
NAME                   COMPLETIONS   DURATION   AGE
mycronjob-1548749400   0/4           3s         3s

kubectl get pods
NAME                         READY   STATUS    RESTARTS   AGE
mycronjob-1548749400-fcrrt   1/1     Running   0          3s
mycronjob-1548749400-n2wbs   1/1     Running   0          3s

kubectl get cronjobs
NAME        SCHEDULE      SUSPEND   ACTIVE   LAST SCHEDULE   AGE
mycronjob   */1 * * * *   False     1        22s             13m

Delete cron job.

kubectl delete -f myCronJob.yaml

cronjob.batch "mycronjob" deleted

8) backofflimit and Kubernetes Cron Jobs

I am impressed and agree with Kubernetes job and cron job functionality in general.

However when a cron job has problems that need the backofflimit functionality it leaves no trace evidence.

A long running job may sometimes experience intermittent and changing problems.

backofflimit deletes crashed jobs, their Pods and their logs. You are only left with the current running job and Pod. kubectl get cronjob does not even hint that this cronjob has problems.

kubectl describe shows misleading event messages: all seems OK, but only success events shown.

Let's investigate.

From 
https://kubernetes.io/docs/concepts/workloads/controllers/jobs-run-to-completion/#pod-backoff-failure-policy

> Pod Backoff failure policy

> There are situations where you want to fail a Job after some amount of retries due to a logical error in configuration etc. To do so, set .spec.backoffLimit to specify the number of retries before considering a Job as failed. The back-off limit is set by default to 6. 

> Failed Pods associated with the Job are recreated by the Job controller with an exponential back-off delay (10s, 20s, 40s …) capped at six minutes. The back-off count is reset if no new failed Pods appear before the Job's next status check.

The cron job below should run every minute.

It exits immediately upon start with error exit code 1.

It has a backoffLimit of 2. It must retry twice to run in case of problems.

nano myCronJob.yaml

apiVersion: batch/v1beta1
kind: CronJob
metadata:
  name: mycronjob
spec:
  schedule: "*/1 * * * *"
  jobTemplate:
    spec:
      template:
        spec:
          containers:
          - name: mycron-container
            image: alpine
            imagePullPolicy: IfNotPresent
            
            command: ['sh', '-c', 'echo Job Pod is Running ; exit 1']
    
          restartPolicy: OnFailure
          terminationGracePeriodSeconds: 0

      backoffLimit: 2

Create the cron job.

kubectl create -f myCronJob.yaml 
   
cronjob.batch/mycronjob created

13 seconds later the first Pod tried to start, got exit 1 error.

kubectl get job
NAME                   COMPLETIONS   DURATION   AGE
mycronjob-1548752820   0/1           13s        13s

kubectl get pods
NAME                         READY   STATUS             RESTARTS   AGE
mycronjob-1548752820-7czts   0/1     CrashLoopBackOff   1          13s

However look at kubectl describe events below.

The event Saw completed job: makes it sound as if job completed successfully, but it did not.

There is no indication on exit 1 condition, and no indication of CrashLoopBackOff status.

kubectl describe cronjob/mycronjob

Name:                       mycronjob
Schedule:                   */1 * * * *
Pod Template:
    Command:
      echo Job Pod is Running ; exit 1
Active Jobs:         <none>
Events:
  Type    Reason            Age   From                Message
  ----    ------            ----  ----                -------
  Normal  SuccessfulCreate  23s   cronjob-controller  Created job mycronjob-1548752820
  Normal  SawCompletedJob   3s    cronjob-controller  Saw completed job: mycronjob-1548752820

A minute later. Nothing in output below hints at a problem with the cron job.

kubectl get cronjobs

NAME        SCHEDULE      SUSPEND   ACTIVE   LAST SCHEDULE   AGE
mycronjob   */1 * * * *   False     0        55s             68s

The output below also does not hint at a problem.

It seems first job is running slow, 77 seconds and still busy ... zero COMPLETIONS.

However the Pod of first job crashed in the first second.

kubectl get job

NAME                   COMPLETIONS   DURATION   AGE
mycronjob-1548752820   0/1           77s        77s
mycronjob-1548752880   0/1           17s        17s

If we look at kubectl describe we see:

kubectl describe cronjob/mycronjob

Name:                       mycronjob

Events:
  Type    Reason            Age    From                Message
  ----    ------            ----   ----                -------
  Normal  SuccessfulCreate  2m26s  cronjob-controller  Created job mycronjob-1548752820
  Normal  SawCompletedJob   2m6s   cronjob-controller  Saw completed job: mycronjob-1548752820
  Normal  SuccessfulCreate  86s    cronjob-controller  Created job mycronjob-1548752880
  Normal  SawCompletedJob   66s    cronjob-controller  Saw completed job: mycronjob-1548752880
  Normal  SuccessfulDelete  66s    cronjob-controller  Deleted job mycronjob-1548752820
  Normal  SuccessfulCreate  26s    cronjob-controller  Created job mycronjob-1548752940
  Normal  SawCompletedJob   6s     cronjob-controller  Saw completed job: mycronjob-1548752940
  Normal  SuccessfulDelete  6s     cronjob-controller  Deleted job mycronjob-1548752880

Last line above: first failed cron job is now deleted ... logs gone.

describe output above SEEMS to show all is well, but it is not: no indication of CrashLoopBackOff status.

If you have an hourly cron job that has such problems you are left with little historic paper trail evidence of what happened.

I do not enjoy troubleshooting cron jobs that use backoffLimit and this behaviors.

Tip: write such cron job log information to a persistent volume and use that as your primary research source. Everything in one place and it is persistent. Plus you are in control of what information to write to the logs.

kubectl delete -f myCronJob.yaml 
   
cronjob.batch "mycronjob" deleted

9) Your Turn

These spec feature fields enable useful functionality configurations:

schedule frequency
startingDeadlineSeconds
concurrencyPolicy
parallelism
completions
restartPolicy

Standalone each field is easy to understand. Combining all these YAML spec fields lead to complex interactions and reactions, especially when combined with unexpected long running cron jobs.

Design your own simple tests to learn these features. Then design some complex tests where you test and learn their interactions.

The moment you are running 2 or more different cron jobs simultaneously in 2 or more terminal windows you will know ... you are becoming an expert.

Community

Kubernetes CronJobs - Part 2: Parallelism

1) Suspend: True

2) startingDeadlineSeconds

3) Parallelism: 3 ... concurrencyPolicy: Allow ... sleep 5

4) Parallelism: 3 ... completions: 6 ... concurrencyPolicy: Allow ... sleep 5

5) Parallelism: 3 ... completions: 6 ... concurrencyPolicy: Allow ... sleep 120

6) Parallelism: 3 ... completions: 6 ... concurrencyPolicy: Forbid ... sleep 120

7) Parallelism: 3 ... completions: 6 ... concurrencyPolicy: Replace ... sleep 120

8) backofflimit and Kubernetes Cron Jobs

9) Your Turn

Read previous post:

Read next post:

Alibaba Clouder

You may also like

Comments

Alibaba Clouder

Related Products

Container Service for Kubernetes

ACK One

Container Registry

Elastic Container Instance