Kubernetes CPU Management Policies

By Alwyn Botha, Alibaba Cloud Community Blog author.

This tutorial demonstrates how to define Pods that run CPU-intensive workloads that are sensitive to context switches plus the following characteristics:

From https://kubernetes.io/blog/2018/07/24/feature-highlight-cpu-manager/

CPU manager might help Kubernetes Pod workloads with the following characteristics:
Sensitive to CPU throttling effects.
Sensitive to context switches.
Sensitive to processor cache misses.
Benefits from sharing a processor resources (e.g., data and instruction caches).
Sensitive to cross-socket memory traffic.
Sensitive or requires hyperthreads from the same physical CPU core.

When your Kubernetes node is under light CPU load, there is no problem. This means there are enough CPU cores for all Pods to work as if they were the only Pods using its CPU.

When many CPU-intensive loads run Pods compete for CPU cores then Pods must share CPU time. As CPU time becomes available such workloads may get done on other CPU cores.

A significant part of CPU time is spend switching between these workloads. This is called a context switch.

https://en.wikipedia.org/wiki/Context_switch

A context switch is the process of storing the state of a process or of a thread, so that it can be restored and execution resumed from the same point later. This allows multiple processes to share a single CPU.

Context switches are usually computationally intensive.

Kubernetes allows you to configure its CPU Manager policy so that such workloads can run more efficiently.

The kubelet CPU Manager policy is set with --cpu-manager-policy

Two policies are supported:

none: the default: the scheduling behavior you normally get on Linux systems.
static: certain Pods are given near exclusivity to CPU cores on a node.

near exclusivity: Kubernetes processes may still use part of the CPU time

near exclusivity: all other Pods are prevented form using the allocated CPU core

Certain Pods

Only Guaranteed pods ( Pods with matching integer CPU requests and limits ) are granted exclusive access to the CPU requests the specify.

From https://kubernetes.io/docs/tasks/administer-cluster/cpu-management-policies/

This static assignment increases CPU affinity and decreases context switches due to throttling for the CPU-bound workload.

From https://kubernetes.io/docs/tasks/configure-pod-container/quality-service-pod/#create-a-pod-that-gets-assigned-a-qos-class-of-guaranteed

For a Pod to be given a Quality of Service / QoS class of Guaranteed:
Every Container in the Pod must have a memory limit and a memory request, and they must be the same.
Every Container in the Pod must have a CPU limit and a CPU request, and they must be the same.

Contents

shared pool of CPUs
minikube configuration
ERROR: configured policy "static" differs from state checkpoint policy "none"
Demo of CPU Management Policy
minikube kubelet cpumanager startup
not exclusive but guaranteed Pod
kube-reserved="cpu=500m"
CPU manager benchmarks

Shared Pool of CPUs

Initially all CPUs are available to all Pods. Guaranteed Pods remove CPUs from this availability pool.

When using this static policy --kube-reserved or --system-reserved must be specified.

These settings reserve CPU resources for Kubernetes system daemons.

--kube-reserved are allocated first : Kubernetes must be able to run

Guaranteed Pods then remove their requested integer CPU quantities from the shared CPU pool.

BestEffort and Burstable pods use remaining CPU pool. Their workload may context switch from time to time but this does not seriously affect their performance. ( If it seriously affects their performance they would have been defined Guaranteed. )

Minikube Configuration

This tutorial uses minikube. To use static CPU Management Policies we need to set these kubelet settings:

cpu-manager-policy="static"
kube-reserved="cpu=500m" ... reserve a bit of CPU for Kubernetes daemons
feature-gates="CPUManager=true" ... switch on capability to use static CPU allocations.

minikube start --extra-config=kubelet.cpu-manager-policy="static" --extra-config=kubelet.kube-reserved="cpu=500m"   --extra-config=kubelet.feature-gates="CPUManager=true"

Minikube will start up as normal. Only when you make a syntax error will it hang.

Everything looks great. Please enjoy minikube! means settings got applied successfully.

ERROR: configured policy "static" differs from state checkpoint policy "none"

Unfortunately this is not mentioned in the documentation.

You cannot just change your Kubernetes cluster from CPU policy "none" to CPU policy "static" using just those flags on minikube start.

You will get this error at the bottom of journalctl -u kubelet :

3110 server.go:262] failed to run Kubelet: could not initialize checkpoint manager: could not restore state from checkpoint: configured policy "static" differs from state checkpoint policy "none"
Feb 07 07:08:43 minikube kubelet[3110]: Please drain this node and delete the CPU manager checkpoint file 
"/var/lib/kubelet/cpu_manager_state" before restarting Kubelet.

To fix this problem get a list of your nodes:

kubectl get nodes

NAME       STATUS   ROLES    AGE   VERSION
minikube   Ready    master   42d   v1.12.4

Then drain your node name:

kubectl drain minikube

Make a backup of /var/lib/kubelet/cpu_manager_state

mv /var/lib/kubelet/cpu_manager_state /var/lib/kubelet/cpu_manager_state-old

Allow the node to accept work again:

kubectl uncordon minikube
node/minikube uncordoned

If you now run minikube with the static CPU Management Policy flags; it should work now.

Demo of CPU Management Policy

We need to define a Guaranteed Pod:

nano myguaranteed.yaml

apiVersion: v1
kind: Pod
metadata:
  name: myguaranteed-pod
spec:
  containers:
  - name: myram-container-1
    image: mytutorials/centos:bench
    imagePullPolicy: IfNotPresent
    
    command: ['sh', '-c', 'stress --vm 1 --vm-bytes 5M --vm-hang 3000 -t 3600']
    
    resources:
      limits:
        memory: "10Mi"
        cpu: 1
      requests:
        memory: "10Mi"
        cpu: 1
    
  restartPolicy: Never
  terminationGracePeriodSeconds: 0

Kubernetes Millicores

You can specify CPU requests and limits using whole integers or Millicores.

One CPU core comprises 1000m = 1000 millicores.

A 4 core node has a CPU capacity of 4000m

4 cores * 1000m = 4000m

CPU Management Policies understand both formats. ( Specifying 1000m for both the CPU values would also work in our example above. )

kubectl create -f myguaranteed.yaml

pod/myguaranteed-pod created

Actual extract from minikube logs moments after the Pod got created.

Feb 07 07:41:39 minikube kubelet[2866]: I0207 07:41:39.833658    2866 policy_static.go:175] [cpumanager] static policy: AddContainer (pod: myguaranteed-pod, container: myram-container-1, container id: 07ff2dc06a922ffbddeb6bb3894492458764d7413cdc7d0552a26910da6ff13d)
Feb 07 07:41:39 minikube kubelet[2866]: I0207 07:41:39.833693    2866 policy_static.go:205] [cpumanager] allocateCpus: (numCPUs: 1)
Feb 07 07:41:39 minikube kubelet[2866]: I0207 07:41:39.833709    2866 state_mem.go:84] [cpumanager] updated default cpuset: "0,2-3"
Feb 07 07:41:39 minikube kubelet[2866]: I0207 07:41:39.834392    2866 policy_static.go:213] [cpumanager] allocateCPUs: returning "1"
Feb 07 07:41:39 minikube kubelet[2866]: I0207 07:41:39.834412    2866 state_mem.go:76] [cpumanager] updated desired cpuset (container id: 07ff2dc06a922ffbddeb6bb3894492458764d7413cdc7d0552a26910da6ff13d, cpuset: "1")

Logs edited for readability:

07:41:39 policy_static.go:175] [cpumanager] static policy: AddContainer (pod: myguaranteed-pod, container: myram-container-1, container id: 07ff2dc06a922ffbddeb6bb3894492458764d7413cdc7d0552a26910da6ff13d)

07:41:39 policy_static.go:205] [cpumanager] allocateCpus: (numCPUs: 1)

07:41:39 state_mem.go:84] [cpumanager] updated default cpuset: "0,2-3"

07:41:39 policy_static.go:213] [cpumanager] allocateCPUs: returning "1"

07:41:39 state_mem.go:76] [cpumanager] updated desired cpuset (container id: 07ff2dc06a922ffbddeb6bb3894492458764d7413cdc7d0552a26910da6ff13d, cpuset: "1")

myguaranteed-pod under management of static policy
it gets 1 CPU allocated: allocateCpus: (numCPUs: 1)
updated default cpuset: "0,2-3" ... remaining Pods can use these CPUs

kubectl describe pods/myguaranteed-pod | grep QoS

QoS Class: Guaranteed

kubectl delete -f myguaranteed.yaml
pod "myguaranteed-pod" deleted

kubelet logs after delete:

Feb 07 08:18:50 minikube kubelet[2866]: I0207 08:18:50.899414    2866 policy_static.go:195] [cpumanager] static policy: RemoveContainer (container id: b900c86f17b6da1899153af6728b6944318107fbcf78a97942667b600e65f6dd)

Feb 07 08:18:50 minikube kubelet[2866]: I0207 08:18:50.909928    2866 state_mem.go:84] [cpumanager] updated default cpuset: "0-3"

Default cpuset updated to include all CPUs again.

Minikube Kubelet cpumanager Startup

Here are some interesting log lines showing cpumanager startup cpuset assignments

Raw log lines

Feb 07 11:55:28 minikube kubelet[2847]: I0207 11:55:28.642182    2847 cpu_manager.go:113] [cpumanager] detected CPU topology: &{4 4 1 map[1:{0 1} 2:{0 2} 3:{0 3} 0:{0 0}]}
Feb 07 11:55:28 minikube kubelet[2847]: I0207 11:55:28.642203    2847 policy_static.go:97] [cpumanager] reserved 1 CPUs ("0") not available for exclusive assignment
Feb 07 11:55:28 minikube kubelet[2847]: I0207 11:55:28.642211    2847 state_mem.go:36] [cpumanager] initializing new in-memory state store
Feb 07 11:55:28 minikube kubelet[2847]: I0207 11:55:28.642486    2847 state_mem.go:84] [cpumanager] updated default cpuset: "0-3"
Feb 07 11:55:28 minikube kubelet[2847]: I0207 11:55:28.642494    2847 state_mem.go:92] [cpumanager] updated cpuset assignments: "map[]"

Edited for clarity.

11:55:28  [cpumanager] detected CPU topology: &{4 4 1 map[1:{0 1} 2:{0 2} 3:{0 3} 0:{0 0}]}

11:55:28  [cpumanager] reserved 1 CPUs ("0") not available for exclusive assignment

11:55:28  [cpumanager] initializing new in-memory state store

11:55:28  [cpumanager] updated default cpuset: "0-3"

11:55:28  [cpumanager] updated cpuset assignments: "map[]"

CPU manager detected 4 CPUs
reserved 1 CPUs ("0") not available for exclusive assignment ... probably due to --kube-reserved="cpu=500m". Part of that CPU is reserved so that full CPU cannot be available for exclusive assignment.
default cpuset: "0-3" ... all 4 CPUs available to all Pods ... including remaining 500 millicores of CPU 0.

Not Exclusive but Guaranteed Pod

If we create a guaranteed Pod , but with FRACTIONAL CPU requests and limits, there are no lines added to kubelet logs about changes in cpuset.

Such Pods will share usage of CPUs in the pool: default cpuset: "0-3"

nano my-not-exclusive-guaranteed.yaml

apiVersion: v1
kind: Pod
metadata:
  name: my-not-exclusive-guaranteed-pod
spec:
  containers:
  - name: myram-container-1
    image: mytutorials/centos:bench
    imagePullPolicy: IfNotPresent
    
    command: ['sh', '-c', 'stress --vm 1 --vm-bytes 5M --vm-hang 3000 -t 3600']
    
    resources:
      limits:
        memory: "10Mi"
        cpu: 1.1
      requests:
        memory: "10Mi"
        cpu: 1.1
    
  restartPolicy: Never
  terminationGracePeriodSeconds: 0

kubectl create -f my-not-exclusive-guaranteed.yaml

pod/my-not-exclusive-guaranteed-pod created

If you now investigate the tail end of minikube logs you will not find any mention of cpuset being changed.

kubectl delete myguaranteed-pod
kubectl delete my-not-exclusive-guaranteed-pod

kube-reserved="cpu=500m"

You can see how this setting ( we used right at start of tutorial ) reserved CPU resources for Kubernetes.

Extract from my 4 CPU Kubernetes node. Note that 500m CPU is reserved.


kubectl describe node minikube
  
Capacity:
 cpu:                4

Allocatable:
 cpu:                3500m

cpu : 4 = 4000m

CPU Manager Benchmarks

This tutorial gave practical experience on Kuberentes CPU manager, you can find more theoretical information about the Kubernetes CPU manager at the official Kubernetes blog: https://kubernetes.io/blog/2018/07/24/feature-highlight-cpu-manager/

Community

Kubernetes CPU Management Policies

Shared Pool of CPUs

Minikube Configuration

ERROR: configured policy "static" differs from state checkpoint policy "none"

Demo of CPU Management Policy

Kubernetes Millicores

Minikube Kubelet cpumanager Startup

Not Exclusive but Guaranteed Pod

kube-reserved="cpu=500m"

CPU Manager Benchmarks

Read previous post:

Read next post:

Alibaba Clouder

You may also like

Comments

Alibaba Clouder

Related Products

ECS(Elastic Compute Service)

Container Service for Kubernetes