Promo Center

50% off for new user

Direct Mail-46% off

Learn More
This topic was translated by AI and is currently in queue for revision by our editors. Alibaba Cloud does not guarantee the accuracy of AI-translated content. Request expedited revision

Configure priority-based resource scheduling

Updated at: 2025-01-08 16:36

Priority-based resource scheduling is an elastic scheduling policy provided by Alibaba Cloud. It allows you to use the ResourcePolicy resource to define the order in which application instance pods are scheduled to different types of node resources during deployment or scale-out activities. During scale-in activities, pods are removed in the reverse order of the original scheduling sequence.

Important

As of kube-scheduler v1.x.x-aliyun-6.4, the default value of the ignorePreviousPod parameter for the priority-based resource scheduling feature is now False, and the default value of the ignoreTerminatingPod parameter is True. Existing ResourcePolicies that use these parameters will not be impacted by this change or subsequent updates.

Prerequisites

  • You have created an ACK Pro cluster running Kubernetes 1.20.11 or later. For details on upgrading, see Upgrade the Kubernetes Version of an ACK Cluster.

  • Ensure the scheduler version meets the requirements for different ACK cluster versions. For more information on the features supported by different scheduler versions, see kube-scheduler.

    ACK version

    Scheduler version

    ACK version

    Scheduler version

    1.20

    v1.20.4-ack-7.0 or later

    1.22

    v1.22.15-ack-2.0 or later

    1.24 or later

    All versions are supported

  • If ECI resources are required, the ack-virtual-node is deployed. For more information, see Use ECI in ACK.

Limits

  • This feature cannot be used in conjunction with the pod-deletion-cost feature. For more information about pod-deletion-cost, see pod-deletion-cost.

  • This feature does not support concurrent use with ECI-based elastic scheduling. For more information about ECI-based elastic scheduling, see Elastic Scheduling with ElasticResource (deprecated).

  • Currently, this feature uses the BestEffort policy and does not guarantee that pods are removed in reverse order during scale-in activities.

  • The max parameter is only available if your cluster is running Kubernetes 1.22 or later and the scheduler version is 5.0 or higher.

  • When using this feature with elastic node pools, invalid nodes may be added. Ensure that the elastic node pools are included in units and do not specify the max parameter for the units.

  • If your scheduler version is below 5.0 or the Kubernetes version of your cluster is 1.20 or earlier, existing pods are prioritized during scale-in activities, even if the ResourcePolicy is created after them.

  • If your scheduler version is below 6.1 or the Kubernetes version of your cluster is 1.20 or earlier, do not modify the ResourcePolicy until all pods selected by it are deleted.

Procedure

To define priority-based resource scheduling, create a ResourcePolicy:

apiVersion: scheduling.alibabacloud.com/v1alpha1
kind: ResourcePolicy
metadata:
  name: test
  namespace: default
spec:
  selector:
    key1: value1
  strategy: prefer
  units:
  - nodeSelector:
      unit: first
    resource: ecs
  - nodeSelector:
      unit: second
    max: 10
    resource: ecs
  - resource: eci
  # Optional, Advanced Configurations
  preemptPolicy: AfterAllUnits
  ignorePreviousPod: false
  ignoreTerminatingPod: true
  matchLabelKeys:
  - pod-template-hash
  podLabels:
    key1: value1
  podAnnotations:
    key1: value1
  whenTryNextUnits:
    policy: TimeoutOrExceedMax
    timeout: 1m
  • selector: Defines the ResourcePolicy as applicable to pods with the label key1=value1 within the same namespace. If selector is not set, the ResourcePolicy applies to all pods in the namespace.

  • strategy: Defines the scheduling strategy. Currently, only prefer is supported.

  • units refer to user-defined scheduling units. During scale-out operations, resources are allocated according to the specified order under units. Conversely, during scale-in operations, resources are released in the reverse order.

    • resource: Specifies the type of elastic resources. Supported types include eci, ecs, elastic, and acs. The elastic type is available for clusters running Kubernetes 1.24 or later with a scheduler version of 6.4.3 or higher. The acs type is available for clusters running Kubernetes 1.26 or later with a scheduler version of 6.7.1 or higher.

      Note

      The elastic type will be deprecated. We recommend using the auto-scaling node pool feature by adding the k8s.aliyun.com/resource-policy-wait-for-ecs-scaling: "true" label in podLabels.

      Note

      The acs type automatically adds the alibabacloud.com/compute-class: default and alibabacloud.com/compute-class: general-purpose labels to pods. You can overwrite these default values by specifying different values in podLabels. If the alpha.alibabacloud.com/compute-qos-strategy annotation is present in podAnnotations, the alibabacloud.com/compute-class: default label is not added to pods.

      Important

      Scheduler versions earlier than 6.8.3 do not support multiple units of the acs type.

    • nodeSelector: Identifies the nodes in this scheduling unit using the label of the node. This is only applicable to ecs resources.

    • max (available for scheduler versions 5.0 or higher): Sets the maximum number of pod replicas that can be scheduled to the unit.

    • podAnnotations: A map[string]string{} type. Key-Value pairs in podAnnotations are updated to the pod by the scheduler. Only pods with these Key-Value pairs are counted when tallying the number of pods in this unit.

    • podLabels: A map[string]string{} type. Key-Value pairs in podLabels are updated to the pod by the scheduler. When counting the number of pods in this unit, only pods with these Key-Value pairs are included.

      Note

      If the podLabels parameter of a unit includes the k8s.aliyun.com/resource-policy-wait-for-ecs-scaling: "true" label and the number of pods in the unit is less than the value of the max parameter, the scheduler waits for pods in the unit. The maximum wait time is defined by the whenTryNextUnits parameter. The k8s.aliyun.com/resource-policy-wait-for-ecs-scaling: "true" label is not updated to pods, and pods do not need this label when their number is calculated.

  • preemptPolicy (available for scheduler versions 6.1 or higher): Determines if the ResourcePolicy can preempt resources when pod scheduling to a unit fails. If set to BeforeNextUnit, the scheduler tries to preempt resources each time it fails to schedule pods to a unit. If set to AfterAllUnits, the scheduler only tries to preempt resources after failing to schedule pods to all units. The default is AfterAllUnits.

  • ignorePreviousPod (available for scheduler versions 6.1 or higher): Must be used with the max parameter in units. If set to true, pods scheduled before the ResourcePolicy creation are not counted when tallying the number of pods.

  • ignoreTerminatingPod (available for scheduler versions 6.1 or higher): Must be used with the max parameter in units. If set to true, pods in the Terminating state are not counted when tallying the number of pods.

  • matchLabelKeys (available for scheduler versions 6.2 or higher): Must be used with the max parameter in units. Pods are grouped based on their label values, and each group has a different max limit. Pods without the specified labels in matchLabelKeys are rejected.

  • whenTryNextUnits (available for clusters running Kubernetes 1.24 or later with a scheduler version of 6.4 or higher): Defines the conditions under which pods can use resources in subsequent units.

    • policy: Specifies the policy for pods. Valid options: ExceedMax, LackResourceAndNoTerminating, TimeoutOrExceedMax, and LackResourceOrExceedMax (default).

      • ExceedMax: When the max parameter for a given unit is unspecified or the pod count within the unit meets or exceeds the max value, pods may utilize resources from the subsequent unit. This approach can be effectively combined with Auto Scaling and Elastic Container Instance (ECI) to prioritize node pool scaling through Auto Scaling.

        Important
        • If the autoscaler fails to add nodes to a node pool for an extended period, pending pods may occur.

        • The autoscaler does not recognize the max limit of the ResourcePolicy. The actual number of instances added may exceed the max limit. This issue will be addressed in future versions.

      • TimeoutOrExceedMax: Pods wait in the current unit if the max parameter is specified and the number of pods is less than the max value, or if the max parameter is not set and the podLabels parameter contains the k8s.aliyun.com/resource-policy-wait-for-ecs-scaling: "true" label. If resources in the current unit are insufficient, pods wait for the specified timeout period. This policy is compatible with Auto Scaling and Elastic Container Instance to preferentially use Auto Scaling for node pool scaling and fallback to elastic container instances after the timeout.

        • The current unit's max parameter is set, and the number of pods within the unit is fewer than the specified max value.

        • The current unit's max parameter is unspecified, and its podLabels parameter includes the k8s.aliyun.com/resource-policy-wait-for-ecs-scaling: "true" label.

        When the resources in the current unit are insufficient for pod scheduling, the pods will remain queued within the unit. The timeout parameter determines the maximum duration they can wait. This approach can be effectively combined with Auto Scaling and Elastic Container Instance, prioritizing the expansion of a node pool through Auto Scaling and resorting to elastic container instances upon reaching the timeout limit.

        Important

        If newly added nodes do not reach the Ready state before the timeout ends and pods are not configured to tolerate the NotReady taint, pods will still be scheduled to elastic container instances.

      • LackResourceOrExceedMax: When the number of pods in a unit meets or exceeds the max parameter value, or if resources in the current unit are insufficient, pods may utilize resources from the next unit. This default policy accommodates most scenarios.

      • LackResourceAndNoTerminating: When the current unit hosts a number of pods that meets or exceeds the max parameter value, yet lacks sufficient resources, and none of the pods are in the Terminating state, pods are permitted to utilize resources from the subsequent unit. This policy is effectively paired with a rolling update policy to avoid dispatching new pods to following units if there are pods in the process of terminating in the current unit.

    • timeout: Defines the timeout period when the policy parameter is set to TimeoutOrExceedMax. If not specified, the default is 15 minutes.

Sample scenarios

Scenario 1: Priority-Based Scheduling for Node Pools

When deploying a Deployment in a cluster with two node pools, Node Pool A and Node Pool B, you may want to prioritize Node Pool A and only schedule pods to Node Pool B if Node Pool A's resources are insufficient. During scale-in activities, pods from Node Pool B should be deleted first. In this example, cn-beijing.10.0.3.137 and cn-beijing.10.0.3.138 are in Node Pool A, while cn-beijing.10.0.6.47 and cn-beijing.10.0.6.46 are in Node Pool B. Each node has 2 vCPUs and 4 GB of memory. Follow these steps to configure priority-based resource scheduling for node pools:

  1. Create a ResourcePolicy using the following YAML file to specify the scheduling sequence for node pools.

    apiVersion: scheduling.alibabacloud.com/v1alpha1
    kind: ResourcePolicy
    metadata:
      name: nginx
      namespace: default
    spec:
      selector:
        app: nginx # The pod label must be the same as the one that you specified for the selector in the ResourcePolicy.
      strategy: prefer
      units:
      - resource: ecs
        nodeSelector:
          alibabacloud.com/nodepool-id: np7ec79f2235954e879de07b780058****
      - resource: ecs
        nodeSelector:
          alibabacloud.com/nodepool-id: npab2df797738644e3a7b7cbf532bb****
    Note

    You can find the ID of a node pool on the Node Management > Node Pool page of the cluster. For more details, see Create and Manage Node Pools.

  2. Deploy two pods using the following YAML file to create a Deployment.

    apiVersion: apps/v1
    kind: Deployment
    metadata:
      name: nginx
      labels:
        app: nginx
    spec:
      replicas: 2
      selector:
        matchLabels:
          app: nginx
      template:
        metadata:
          name: nginx
          labels:
            app: nginx # The pod label must be the same as the one that you specified for the selector in the ResourcePolicy.
        spec:
          containers:
          - name: nginx
            image: nginx
            resources:
              limits:
                cpu: 2
              requests:
                cpu: 2
  3. Create the Nginx application and verify the deployment outcome.

    1. Execute the following command to create the Nginx application.

      kubectl apply -f nginx.yaml

      Expected output:

      deployment.apps/nginx created
    2. Check the deployment result with the following command.

      kubectl get pods -o wide

      Expected output:

      NAME                    READY   STATUS    RESTARTS   AGE   IP               NODE                    NOMINATED NODE   READINESS GATES
      nginx-9cdf7bbf9-b****   1/1     Running   0          17s   172.29.112.216   cn-beijing.10.0.3.137   <none>           <none>
      nginx-9cdf7bbf9-k****   1/1     Running   0          17s   172.29.113.24    cn-beijing.10.0.3.138   <none>           <none>

      The output indicates that the two pods are scheduled to nodes in Node Pool A.

  4. Expand the number of pods.

    1. Scale out the pods to four replicas with the following command.

      kubectl scale deployment nginx --replicas 4                      

      Expected output:

      deployment.apps/nginx scaled
    2. Verify the pod status with the following command.

      kubectl get pods -o wide

      Expected output:

      NAME                    READY   STATUS    RESTARTS   AGE    IP               NODE                    NOMINATED NODE   READINESS GATES
      nginx-9cdf7bbf9-b****   1/1     Running   0          101s   172.29.112.216   cn-beijing.10.0.3.137   <none>           <none>
      nginx-9cdf7bbf9-k****   1/1     Running   0          101s   172.29.113.24    cn-beijing.10.0.3.138   <none>           <none>
      nginx-9cdf7bbf9-m****   1/1     Running   0          18s    172.29.113.156   cn-beijing.10.0.6.47    <none>           <none>
      nginx-9cdf7bbf9-x****   1/1     Running   0          18s    172.29.113.89    cn-beijing.10.0.6.46    <none>           <none>

      The output shows that additional pods are scheduled to Node Pool B due to insufficient resources in Node Pool A.

  5. Reduce the number of pods.

    1. Scale in the pods from four replicas to two with the following command.

      kubectl scale deployment nginx --replicas 2

      Expected output:

      deployment.apps/nginx scaled
    2. Check the pod status with the following command.

      kubectl get pods -o wide

      Expected output:

      NAME                    READY   STATUS        RESTARTS   AGE     IP               NODE                    NOMINATED NODE   READINESS GATES
      nginx-9cdf7bbf9-b****   1/1     Running       0          2m41s   172.29.112.216   cn-beijing.10.0.3.137   <none>           <none>
      nginx-9cdf7bbf9-k****   1/1     Running       0          2m41s   172.29.113.24    cn-beijing.10.0.3.138   <none>           <none>
      nginx-9cdf7bbf9-m****   0/1     Terminating   0          78s     172.29.113.156   cn-beijing.10.0.6.47    <none>           <none>
      nginx-9cdf7bbf9-x****   0/1     Terminating   0          78s     172.29.113.89    cn-beijing.10.0.6.46    <none>           <none>

      The output indicates that pods on nodes in Node Pool B are deleted in the reverse order of the scheduling sequence.

Scenario 2: Mixed Scheduling for ECS and ECI

When deploying a Deployment, if your cluster has subscription ECS instances, pay-as-you-go ECS instances, and elastic container instances, you may want to schedule pods based on cost efficiency: subscription ECS instances first, then pay-as-you-go ECS instances, and finally elastic container instances. During scale-in activities, pods should be deleted in the reverse order: elastic container instances first, followed by pay-as-you-go ECS instances, and lastly subscription ECS instances. In this example, each node has 2 vCPUs and 4 GB of memory. Follow these steps to configure mixed scheduling for ECS and ECI:

  1. Assign different labels to nodes based on their billing types using the following commands (alternatively, use the node pool feature to automatically manage labels).

    kubectl label node cn-beijing.10.0.3.137 paidtype=subscription
    kubectl label node cn-beijing.10.0.3.138 paidtype=subscription
    kubectl label node cn-beijing.10.0.6.46 paidtype=pay-as-you-go
    kubectl label node cn-beijing.10.0.6.47 paidtype=pay-as-you-go
  2. Create a ResourcePolicy specifying the scheduling sequence for resources using the following YAML file.

    apiVersion: scheduling.alibabacloud.com/v1alpha1
    kind: ResourcePolicy
    metadata:
      name: nginx
      namespace: default
    spec:
      selector:
        app: nginx # The pod label must be the same as the one that you specified for the selector in the ResourcePolicy.
      strategy: prefer
      units:
      - resource: ecs
        nodeSelector:
          paidtype: subscription
      - resource: ecs
        nodeSelector:
          paidtype: pay-as-you-go
      - resource: eci
  3. Deploy two pods using the following YAML file to create a Deployment.

    apiVersion: apps/v1
    kind: Deployment
    metadata:
      name: nginx
      labels:
        app: nginx
    spec:
      replicas: 2
      selector:
        matchLabels:
          app: nginx
      template:
        metadata:
          name: nginx
          labels:
            app: nginx # The pod label must be the same as the one that you specified for the selector in the ResourcePolicy.
        spec:
          containers:
          - name: nginx
            image: nginx
            resources:
              limits:
                cpu: 2
              requests:
                cpu: 2
  4. Create the Nginx application and verify the deployment outcome.

    1. Execute the following command to create the Nginx application.

      kubectl apply -f nginx.yaml

      Expected output:

      deployment.apps/nginx created
    2. Check the deployment result with the following command.

      kubectl get pods -o wide

      Expected output:

      NAME                    READY   STATUS    RESTARTS   AGE   IP               NODE                    NOMINATED NODE   READINESS GATES
      nginx-9cdf7bbf9-b****   1/1     Running   0          66s   172.29.112.215   cn-beijing.10.0.3.137   <none>           <none>
      nginx-9cdf7bbf9-r****   1/1     Running   0          66s   172.29.113.23    cn-beijing.10.0.3.138   <none>           <none>

      The output indicates that the first two pods are scheduled to nodes with the label paidtype=subscription.

  5. Expand the number of pods.

    1. Scale out the pods to four replicas with the following command.

      kubectl scale deployment nginx --replicas 4

      Expected output:

      deployment.apps/nginx scaled
    2. Verify the pod status with the following command.

      kubectl get pods -o wide

      Expected output:

      NAME                    READY   STATUS    RESTARTS   AGE     IP               NODE                    NOMINATED NODE   READINESS GATES
      nginx-9cdf7bbf9-4****   1/1     Running   0          16s     172.29.113.155   cn-beijing.10.0.6.47    <none>           <none>
      nginx-9cdf7bbf9-b****   1/1     Running   0          3m48s   172.29.112.215   cn-beijing.10.0.3.137   <none>           <none>
      nginx-9cdf7bbf9-f****   1/1     Running   0          16s     172.29.113.88    cn-beijing.10.0.6.46    <none>           <none>
      nginx-9cdf7bbf9-r****   1/1     Running   0          3m48s   172.29.113.23    cn-beijing.10.0.3.138   <none>           <none>

      The output shows that when nodes with the label paidtype=subscription are insufficient, pods are scheduled to nodes with the label paidtype=pay-as-you-go.

    3. Increase the pod count to six replicas with the following command.

      kubectl scale deployment nginx --replicas 6

      Expected output:

      deployment.apps/nginx scaled
    4. Check the pod status with the following command.

      kubectl get pods -o wide

      Expected output:

      NAME                    READY   STATUS    RESTARTS   AGE     IP               NODE                           NOMINATED NODE   READINESS GATES
      nginx-9cdf7bbf9-4****   1/1     Running   0          3m10s   172.29.113.155   cn-beijing.10.0.6.47           <none>           <none>
      nginx-9cdf7bbf9-b****   1/1     Running   0          6m42s   172.29.112.215   cn-beijing.10.0.3.137          <none>           <none>
      nginx-9cdf7bbf9-f****   1/1     Running   0          3m10s   172.29.113.88    cn-beijing.10.0.6.46           <none>           <none>
      nginx-9cdf7bbf9-r****   1/1     Running   0          6m42s   172.29.113.23    cn-beijing.10.0.3.138          <none>           <none>
      nginx-9cdf7bbf9-s****   1/1     Running   0          36s     10.0.6.68        virtual-kubelet-cn-beijing-j   <none>           <none>
      nginx-9cdf7bbf9-v****   1/1     Running   0          36s     10.0.6.67        virtual-kubelet-cn-beijing-j   <none>           <none>

      The output indicates that additional pods are scheduled to elastic container instances due to a shortage of ECS nodes.

  6. Reduce the number of pods.

    1. Scale in the pods from six replicas to four with the following command.

      kubectl scale deployment nginx --replicas 4

      Expected output:

      deployment.apps/nginx scaled
    2. Verify the pod status with the following command.

      kubectl get pods -o wide

      Expected output:

      NAME                    READY   STATUS        RESTARTS   AGE     IP               NODE                           NOMINATED NODE   READINESS GATES
      nginx-9cdf7bbf9-4****   1/1     Running       0          4m59s   172.29.113.155   cn-beijing.10.0.6.47           <none>           <none>
      nginx-9cdf7bbf9-b****   1/1     Running       0          8m31s   172.29.112.215   cn-beijing.10.0.3.137          <none>           <none>
      nginx-9cdf7bbf9-f****   1/1     Running       0          4m59s   172.29.113.88    cn-beijing.10.0.6.46           <none>           <none>
      nginx-9cdf7bbf9-r****   1/1     Running       0          8m31s   172.29.113.23    cn-beijing.10.0.3.138          <none>           <none>
      nginx-9cdf7bbf9-s****   1/1     Terminating   0          2m25s   10.0.6.68        virtual-kubelet-cn-beijing-j   <none>           <none>
      nginx-9cdf7bbf9-v****   1/1     Terminating   0          2m25s   10.0.6.67        virtual-kubelet-cn-beijing-j   <none>           <none>

      The output shows that pods on elastic container instances are deleted in the reverse order of the scheduling sequence.

    3. Scale in the pods from four replicas to two with the following command.

      kubectl scale deployment nginx --replicas 2

      Expected output:

      deployment.apps/nginx scaled
    4. Check the pod status with the following command.

      kubectl get pods -o wide

      Expected output:

      NAME                    READY   STATUS        RESTARTS   AGE     IP               NODE                    NOMINATED NODE   READINESS GATES
      nginx-9cdf7bbf9-4****   0/1     Terminating   0          6m43s   172.29.113.155   cn-beijing.10.0.6.47    <none>           <none>
      nginx-9cdf7bbf9-b****   1/1     Running       0          10m     172.29.112.215   cn-beijing.10.0.3.137   <none>           <none>
      nginx-9cdf7bbf9-f****   0/1     Terminating   0          6m43s   172.29.113.88    cn-beijing.10.0.6.46    <none>           <none>
      nginx-9cdf7bbf9-r****   1/1     Running       0          10m     172.29.113.23    cn-beijing.10.0.3.138   <none>           <none>

      The output shows that pods on nodes with the label paidtype=pay-as-you-go are prioritized for deletion in the reverse order of the scheduling sequence.

    5. Verify the pod status with the following command.

      kubectl get pods -o wide

      Expected output:

      NAME                    READY   STATUS    RESTARTS   AGE   IP               NODE                    NOMINATED NODE   READINESS GATES
      nginx-9cdf7bbf9-b****   1/1     Running   0          11m   172.29.112.215   cn-beijing.10.0.3.137   <none>           <none>
      nginx-9cdf7bbf9-r****   1/1     Running   0          11m   172.29.113.23    cn-beijing.10.0.3.138   <none>           <none>

      The output confirms that only pods with the label paidtype=subscription remain on the nodes.

References

  • When deploying Services in an ACK cluster, you can configure tolerations and node affinity to use only ECS instances or elastic container instances, or allow the scheduler to automatically request elastic container instances when ECS instances are insufficient. Different scheduling policies can be configured to scale resources in various scenarios. For more information, see Specify ECS and ECI Resource Allocation.

  • Ensuring high availability and performance is critical for distributed tasks. Within ACK Pro clusters, you can leverage Kubernetes-native scheduling semantics to distribute tasks across multiple zones for high availability, or target specific zones to optimize for performance. For more information, see Distribute and affinity schedule ECI pods across zones.

  • On this page (1)
  • Prerequisites
  • Limits
  • Procedure
  • Sample scenarios
  • References
Feedback
phone Contact Us

Chat now with Alibaba Cloud Customer Service to assist you in finding the right products and services to meet your needs.

alicare alicarealicarealicare