This topic provides answers to some frequently asked questions about the scheduling policies of Container Service for Kubernetes (ACK) clusters.
Table of contents
How do I avoid pod startup failures due to insufficient IP addresses provided by the vSwitch?
How do I migrate from ack-descheduler to Koordinator Descheduler?
Why are pods not scheduled to the new node that I added to the cluster?
What are the precautions for using the descheduling feature in ACK? Does this feature restart pods?
How do I ensure the high availability of pods when scheduling the pods of a workload?
How do I avoid pod startup failures due to insufficient IP addresses provided by the vSwitch?
The Kubernetes-native scheduler cannot detect whether the available IP addresses of the vSwitch are sufficient. As a result, the system continues to attempt scheduling pods to these nodes even when the cluster lacks sufficient IP addresses and pods fail to start, leading to a rapid increase in faulty pods. To resolve this issue, ACK scheduler adds the k8s.aliyun.com/max-available-ip
annotation to specify the maximum number of available IP addresses for each node. Based on this annotation and whether a pod requires an exclusive IP address, the scheduler limits the number of pods that can be scheduled to a node. Additionally, when a node is detected to have exhausted its IP addresses, the scheduler updates the SufficientIP
condition in the node status. This update prevents adding new pods that need exclusive IP addresses to that node, preventing large-scale pod failures caused by insufficient IP addresses.
To use the k8s.aliyun.com/max-available-ip
annotation, your cluster and the kube-scheduler component must meet the following requirements:
The cluster must be an ACK Pro cluster and Terway 1.5.7 or later is installed in the cluster. For more information, see Create an ACK managed cluster.
The kube-scheduler version must meet the following requirements:
Kubernetes version
kube-scheduler version
1.30
All versions
1.28
V1.28.3-aliyun-6.3 or later
1.26
V1.26.3-aliyun-6.3 or later
1.24
V1.24.6-aliyun-6.3 or later
1.22
V1.22.15-aliyun-6.3 or later
How do I migrate from ack-descheduler to Koordinator Descheduler?
ACK provides the ack-descheduler descheduling component on the Marketplace page of the ACK console. The component is developed based on open source Kubernetes Descheduler. ack-descheduler 0.20.0 and 0.27.1 are available. ack-descheduler 0.20.0 and 0.27.1 provide the same features and can be used in the same way as open source Kubernetes Descheduler 0.20.0 and 0.27.1.
ack-descheduler is discontinued. We recommend that you migrate from ack-descheduler to Koordinator Descheduler. The migration procedure is similar to the procedure for migrating from Kubernetes Descheduler to Koordinator Descheduler. For more information, see Migrate from Kubernetes Descheduler to Koordinator Descheduler.
What is the default scheduling policy of the ACK scheduler?
In ACK clusters, the default scheduling policy of the ACK scheduler is as same as that of the open source Kubernetes scheduler. When the Kubernetes scheduler decides how to schedule a pod to a node, it performs the following two key steps: Filter and Score.
Filter: This step searches for schedulable nodes. If the node list is empty, the pod cannot be scheduled.
Score: This step scores and ranks the schedulable nodes in order to select the most suitable node for the pod.
For more information about the Filter and Score plug-ins enabled in the latest version of the ACK scheduler, see Filter and score plug-ins.
How do I avoid scheduling pods to hotspot nodes?
In the Kubernetes-native scheduling policy, the scheduler schedules pods based on resource requests and does not consider the actual utilization of nodes. The scheduler uses various Filter and Score plug-ins that collaboratively affect the scheduling. We recommend that you use the following features of ACK clusters to avoid scheduling pods to hotspot nodes:
Configure appropriate resource requests and limits for each pod to ensure resource redundancy. You can use the resource profiling feature to obtain suggested container specifications based on the analysis of historical resource usage data. For more information, see Resource profiling.
Enable workload-aware scheduling. The load-aware scheduling feature of the Kube Scheduler component provided by ACK is designed based on the Kubernetes scheduling framework. Unlike the Kubernetes-native scheduling policy, the ACK scheduler can obtain the actual resource usage of nodes. The system schedules pods to nodes with low loads based on the historical statistics of the loads of nodes to implement load balancing. For more information, see Use load-aware scheduling.
Enable load-aware hotspot descheduling. As factors such as time, cluster environment, or traffic or requests to workloads dynamically change, the loads between nodes may become imbalanced. To prevent this issue, ACK scheduler provides the descheduling feature. For more information, see Work with load-aware hotspot descheduling.
Why are pods not scheduled to the new node that I added to the cluster?
The cause can be various. You can perform the following operations to troubleshoot the issue.
Check whether the status of the node is normal. If the node is in the NotReady state, the node is not ready.
Check whether the pod is configured with an inappropriate scheduling policy, such as NodeSelector, NodeAffinity, or PodAffinity or whether the node has taints. If one of the preceding conditions exists, the pod cannot be scheduled to the new node.
Check whether the issue is related to the Kubernetes-native scheduling policy. In the Kubernetes-native scheduling policy, the scheduler schedules pods based on resource requests and does not consider the actual utilization of nodes. As a result, a large number of pods run on some nodes, and a few or no pods run on other nodes.
For more information about how to solve this problem, see How do I avoid scheduling pods to hotspot nodes?
Why does the system display a message indicating insufficient CPU or memory resources during scheduling even if the CPU or memory usage in the cluster is not high?
In the Kubernetes-native scheduling policy, the scheduler schedules pods based on resource requests and does not consider the actual utilization of nodes. Even if the actual CPU usage of the cluster is not high, pods may still fail to be scheduled due to insufficient CPU or memory resources.
For more information about how to resolve this problem, see How do I avoid scheduling pods to hotspot nodes?
What are the precautions for using the descheduling feature in ACK? Does this feature restart pods?
ACK Koordinator Descheduler provides the descheduling feature. When you use the descheduling feature, take note of the following items:
Koordinator Descheduler only evicts running pods and does not recreate or schedule the evicted pods. After a pod is evicted, the pod is recreated by the workload controller, such as a Deployment and StatefulSet. The recreated pod is scheduled by the scheduler.
During the descheduling process, the old pods are evicted and then new pods are created. Make sure that your application has sufficient
replicas
in case application availability is affected during eviction.
For more information, see Descheduling overview.
How do I schedule pods to specific nodes?
You can configure node labels and configure the nodeSelector parameter in the YAML file to schedule pods to specific nodes. For more information, see Schedule pods to specific nodes.
How do I schedule a specific number of pods created by a Deployment to an ECS instance and an elastic container instance?
In scenarios in which Elastic Compute Service (ECS) instances and elastic container instances are colocated, you can adjust the number of replicas on each subset by using the UnitedDeployment controller to manage workloads. For example, you can set replicas
to 10
in subset-ecs and replicas
to 10
in subset-eci in the UnitedDeployment YAML file. For more information, see Use the UnitedDeployment controller in ACK clusters.
How do I ensure the high availability of pods when scheduling the pods of a workload?
You can use inter-pod affinity and anti-affinity to distribute the pods of a workload to different zones or nodes. For example, you can add the following fields to the pods based on the following YAML file to specify the preferredDuringSchedulingIgnoredDuringExecution
preference rule. This way, the scheduler schedules the pods with the security=S2
label to different zones. If the conditions are not met, the scheduler schedules the pods to other nodes.
spec:
affinity:
podAntiAffinity:
preferredDuringSchedulingIgnoredDuringExecution:
- weight: 100
podAffinityTerm:
labelSelector:
matchExpressions:
- key: security
operator: In
values:
- S2
topologyKey: topology.kubernetes.io/zone
For more information, see Inter-pod affinity and anti-affinity and Pod topology spread constraints.