Use the descheduler component to optimize pod scheduling - Container Service for Kubernetes

You can use ack-descheduler to optimize the scheduling of pods that cannot be matched with suitable nodes. This avoids resource waste and improves resource utilization in Container Service for Kubernetes (ACK) clusters. This topic describes how to use ack-descheduler to optimize pod scheduling.

Important

ack-descheduler is no longer maintained. We recommend that you migrate to the currently maintained component Koordinator Descheduler. For more information, see [Component Notice] ack-descheduler migration.

Prerequisites

An ACK cluster that runs Kubernetes 1.14 or later is created. For more information, see Create an ACK managed cluster.
A kubectl client is connected to the cluster. For more information, see Obtain the kubeconfig file of a cluster and use kubectl to connect to the cluster.
The Helm component version must be v3.0 or later. For more information, see Update Helm V2 to Helm V3.

Install ack-descheduler

Log on to the ACK console.
In the left-side navigation pane of the ACK console, choose Marketplace > App Catalog.
On the Marketplace page, click the App Catalog tab. Find and click ack-descheduler.
On the ack-descheduler page, click Deploy.
In the Deploy wizard, select a cluster and a namespace, and then click Next.
On the Parameters wizard page, configure the parameters and click OK.
After ack-descheduler is installed, a CronJob is automatically created in the kube-system namespace. By default, this CronJob runs every 2 minutes. After ack-descheduler is installed, you are directed to the ack-descheduler-default page. If all the relevant resources are created, as shown in the following figure, the component is installed.

Use ack-descheduler to optimize pod scheduling

Run the following command to check the DeschedulerPolicy setting of the ack-descheduler-default ConfigMap.

kubectl describe cm ack-descheduler-default -n kube-system

Expected output:

Click to view details

Name:         descheduler
Namespace:    kube-system
Labels:       app.kubernetes.io/instance=descheduler
              app.kubernetes.io/managed-by=Helm
              app.kuberne


tes.io/name=descheduler
              app.kubernetes.io/version=0.20.0
              helm.sh/chart=descheduler-0.20.0
Annotations:  meta.helm.sh/release-name: descheduler
              meta.helm.sh/release-namespace: kube-system
Data
====
policy.yaml:
----
apiVersion: "descheduler/v1alpha1"
kind: "DeschedulerPolicy"
strategies:
  "RemoveDuplicates":  
     enabled: true
  "RemovePodsViolatingInterPodAntiAffinity": 
     enabled: true
  "LowNodeUtilization": 
     enabled: true
     params:
       nodeResourceUtilizationThresholds:
         thresholds:
           "cpu" : 20
           "memory": 20
           "pods": 20
         targetThresholds:
           "cpu" : 50
           "memory": 50
           "pods": 50
  "RemovePodsHavingTooManyRestarts":
     enabled: true
     params:
       podsHavingTooManyRestarts:
         podRestartThreshold: 100
         includingInitContainers: true
Events:  <none>

The following table describes the scheduling policies returned in the preceding output. For more information about the policy settings in the strategies section, see Descheduler.

Policy	Description
RemoveDuplicates	This policy removes duplicate pods and ensures that only one pod is associated with a ReplicaSet, ReplicationController, StatefulSet, or Job that runs on the same node.
RemovePodsViolatingInterPodAntiAffinity	This policy deletes pods that violate inter-pod anti-affinity rules.
LowNodeUtilization	This policy finds nodes that are underutilized, evicts pods from other nodes, and recreates the pods on the underutilized nodes. The parameters of this policy are configured in the `nodeResourceUtilizationThresholds` section.
RemovePodsHavingTooManyRestarts	This policy deletes pods that have been restarted for a specified number of times.

Verify pod scheduling before the scheduling policy is modified.

Create a Deployment to test the scheduling.

Create an nginx.yaml file and copy the following content to the file:

apiVersion: apps/v1 # for versions before 1.8.0 use apps/v1beta1
kind: Deployment
metadata:
  name: nginx-deployment-basic
  labels:
    app: nginx
spec:
  replicas: 3
  selector:
    matchLabels:
      app: nginx
  template:
    metadata:
      labels:
        app: nginx
    spec:
      containers:
      - name: nginx
        image: nginx:1.7.9 # Replace with the image that you want to use. The value must be in the <image_name:tags> format. 
        ports:
        - containerPort: 80

Run the following command to create a Deployment with the nginx.yaml file:

kubectl apply -f nginx.yaml

Expected output:

deployment.apps/nginx-deployment-basic created

Wait 2 minutes and run the following command to check the nodes to which the pods are scheduled:

kubectl get pod -o wide | grep nginx

Expected output:

NAME                          READY   STATUS     RESTARTS   AGE    IP               NODE                         NOMINATED NODE   READINESS GATES
nginx-deployment-basic-**1    1/1     Running    0          36s    172.25.XXX.XX1   cn-hangzhou.172.16.XXX.XX2   <none>           <none>
nginx-deployment-basic-**2    1/1     Running    0          11s    172.25.XXX.XX2   cn-hangzhou.172.16.XXX.XX3   <none>           <none>
nginx-deployment-basic-**3    1/1     Running    0          36s    172.25.XXX.XX3   cn-hangzhou.172.16.XXX.XX3   <none>           <none>

The output shows that pod nginx-deployment-basic-**2 and pod nginx-deployment-basic-**3 are scheduled to the same node cn-hangzhou.172.16.XXX.XX3.

Note

If you use the default settings for the ack-descheduler-default ConfigMap, the scheduling result varies based on actual conditions of the cluster.

Modify the scheduling policy.

If you use multiple scheduling policies, unexpected scheduling results may be obtained. To prevent this issue, modify the ConfigMap in Step 1 to retain only the RemoveDuplicates policy.

Note

The RemoveDuplicates policy ensures that pods managed by replication controllers are evenly distributed to different nodes.

In this example, the name of the ConfigMap is changed to newPolicy.yaml after the modification. The modified ConfigMap contains the following content:

apiVersion: v1
kind: ConfigMap
metadata:
  name: descheduler
  namespace: kube-system
  labels:
    app.kubernetes.io/instance: descheduler
    app.kubernetes.io/managed-by: Helm
    app.kubernetes.io/name: descheduler
    app.kubernetes.io/version: 0.20.0
    helm.sh/chart: descheduler-0.20.0  
  annotations:
    meta.helm.sh/release-name: descheduler
    meta.helm.sh/release-namespace: kube-system
data: 
  policy.yaml: |-
    apiVersion: "descheduler/v1alpha1"
    kind: "DeschedulerPolicy"
    strategies:
      "RemoveDuplicates": # Retain only the RemoveDuplicates policy. 
         enabled: true

Verify pod scheduling after the scheduling policy is modified.

Run the following command to apply the new scheduling policy:
```
kubectl apply -f newPolicy.yaml
```
Expected output:
```
configmap/descheduler created
```

Wait 2 minutes and run the following command to check the nodes to which the pods are scheduled:

kubectl get pod -o wide | grep nginx

Expected output:

NAME                          READY   STATUS     RESTARTS   AGE      IP               NODE                         NOMINATED NODE   READINESS GATES
nginx-deployment-basic-**1    1/1     Running    0          8m26s    172.25.XXX.XX1   cn-hangzhou.172.16.XXX.XX2   <none>           <none>
nginx-deployment-basic-**2    1/1     Running    0          8m1s     172.25.XXX.XX2   cn-hangzhou.172.16.XXX.XX1   <none>           <none>
nginx-deployment-basic-**3    1/1     Running    0          8m26s    172.25.XXX.XX3   cn-hangzhou.172.16.XXX.XX3   <none>           <none>

The output shows that pod nginx-deployment-basic-**2 is rescheduled to cn-hangzhou.172.16.XXX.XX1 by ack-descheduler. In this case, each of the three test pods is scheduled to a different node. This balances pod scheduling among multiple nodes.