Use node instant scaling for auto-scaling and enhanced efficiency - Container Service for Kubernetes

If your cluster is large, the cluster requires faster resource scaling, or you want to automatically scale resources across multiple instance types and zones, node auto scaling may not meet your requirements. In this scenario, we recommend that you use node instant scaling. A cluster is considered large if a node pool that has auto scaling enabled in the cluster contains more than 100 nodes or more than 20 node pools in the cluster have auto scaling enabled. The node instant scaling feature reduces the technical gap for developers, improves scaling efficiency, and reduces manpower for O&M.

Before you start

To better work with the node instant scaling feature, we recommend that you read Overview of node scaling and pay attention to the following items before you start:

How node instant scaling works

Benefits of node instant scaling

Use scenarios of node instant scaling

Usage notes for node instant scaling

Prerequisites

A Container Service for Kubernetes (ACK) managed cluster or ACK dedicated cluster that runs Kubernetes 1.24 or later has been created. For more information, see Create an ACK managed cluster, Create an ACK dedicated cluster, and Update an ACK cluster.
Elastic Scaling Service (ESS) has been activated.

Note

If your node pool has auto scaling enabled and Scaling Mode is not set to Swift, the node instant scaling feature is compatible with the original semantics and behavior of the node pool configuration. In addition, this feature can be seamlessly enabled for all types of pods. If Scaling Mode is set to Swift, the node instant scaling feature is incompatible with the node pool.

Step 1: Enable node instant scaling

Before you use node instant scaling, you must enable auto scaling on the Node Pools page. When you configure node instant scaling, select Auto Scaling as the Node Scaling Method.

Log on to the ACK console. In the left-side navigation pane, click Clusters.
On the Clusters page, find the cluster that you want to manage and click its name. In the left-side navigation pane, choose Nodes > Node Pools.
On the Node Pools page, click Enable next to Node Scaling.
If this is the first time you use auto scaling, complete authorization as prompted. If you have already completed the authorization, skip this step.
- For an ACK managed cluster, authorize ACK to use the AliyunCSManagedAutoScalerRole for accessing your cloud resources.
- For an ACK dedicated cluster, authorize ACK to use the KubernetesWorkerRole and AliyunCSManagedAutoScalerRolePolicy for scaling management. The following figure shows the console page on which you can make the authorization when you enable Node Scaling.

In the Node Scaling Configuration panel, set Node Scaling Method to Auto Scaling, configure scaling parameters, and then click OK.

Scale-out activities are automatically triggered based on the actual scheduling status. You need only to configure scale-in conditions.

Parameter	Description
Scale-in Threshold	Specify the ratio of the resource request of a node to resource capacity of the node in a node pool that has node instant scaling enabled. A scale-in activity is performed only when the CPU and memory utilization of a node is lower than the Scale-in Threshold.
GPU Scale-in Threshold	The scale-in threshold for GPU-accelerated nodes. A scale-in activity is performed only when the CPU, memory, and GPU utilization of a node is lower than the Scale-in Threshold.
Defer Scale-in For	The interval between the time when the scale-in threshold is reached and the time when the scale-in activity (reduce the number of pods) starts. Unit: minutes. Default value: 10. Important The autoscaler performs scale-in activities only when Scale-in Threshold is configured and the Defer Scale-in For condition is met.
Cooldown	After the autoscaler performs a scale-out activity, the autoscaler waits a cooldown period before it can perform a scale-in activity. The autoscaler cannot perform scale-in activities within the cooldown period but can still check whether the nodes meet the scale-in conditions. After the cooldown period ends, if a node meets the scale-in conditions and the waiting period specified in the Defer Scale-in For parameter ends, the node is removed. For example, the Cooldown parameter is set to 10 minutes and the Defer Scale-in For parameter is set to 5 minutes. The autoscaler cannot perform scale-in activities within the 10-minute cooldown period after performing a scale-out activity. However, the autoscaler can still check whether the nodes meet the scale-in conditions within the cooldown period. When the cooldown period ends, the nodes that meet the scale-in conditions are removed after 5 minutes.

View advanced scale-in settings

Parameter	Description
Pod Termination Timeout	The maximum amount of time to wait for pods on a node to terminate during a scale-in activity. Unit: seconds.
Minimum Number of Replicated Pods	The minimum number of pods that are allowed in each ReplicaSet before node scaling down.
Enable DaemonSet Pods Draining	When enabled, DaemonSet pods are evicted during a scale-in activity.
Skip Nodes with Pods in the Kube-System Namespace	When enabled, nodes with pods running in the kube-system namespace is ignored during a scale-in activity, ensuring these nodes are not affected. Note This feature does not take effect on the Mirror Pod and DaemonSet Pod.

Step 2: Create a node pool that has auto scaling enabled

The node instant scaling feature scales only nodes in node pools that have auto scaling enabled. Therefore, after configuring node instant scaling, you also need to configure at least one node pool that has auto scaling enabled.

Create a new node pool that has auto scaling enabled. For more information, see Create a node pool.
Enable auto scaling for an existing node pool. For more information, see Modify a node pool.
Note
When configuring an existing node pool, ensure the Expected Number of Nodes is set to null. You can verify this by navigating to the node pool details page and find this parameter under the Overview tab. Alternatively, call the DescribeClusterNodePoolDetail API to check if desired_size is nil.

We recommend that you configure a diverse range of instance types for your node pool. This can be achieved by specifying multiple instance types, using generic instance types, and configuring multiple availability zones. These methods help ensure a sufficient inventory of instance types and the successful execution of node scaling activities.

Step 3: (Optional) Verify node instant scaling

After you complete the preceding configuration, you can use the node instant scaling feature. The node pool displays that auto scaling is enabled and Node Auto Scaling is installed in the cluster.

Auto scaling is enabled for the node pool

The Node Pools page shows that auto scaling is enabled for the node pool.

Node Auto Scaling is installed

On the Clusters page, find the cluster that you want to manage and click its name. In the left-side navigation pane, choose Operations > Add-ons.
On the Add-ons page, the ACK GOATScaler component displays Installed.

Introduction to key events related to node instant scaling

The node instant scaling feature involves the following key events. This helps you learn the internal status of node instant scaling when these events occur.

Event name	Event object	Description
ProvisionNode	pod	The node instant scaling feature triggers a node scale-out activity.
ProvisionNodeFailed	pod	The node instant scaling feature fails to trigger a node scale-out activity.
ResetPod	pod	The node instant scaling feature re-adds pods that meet the scale-out conditions and have triggered scale-out activities but are still in the Unschedulable state to the scale-out list.
InstanceInventoryStatusChanged	ACKNodePool	The supply status of an instance specification in the configured zone changes. The format is `{InstanceType}/{Zone} inventory status changed from {OldInventoryStatus} to {NewInventoryStatus}`. For more information, see View the health status of node instant scaling.

Introduction to node instant scaling labels

The node instant scaling feature maintains the following labels. Do not manually modify these labels in case exceptions occur.

Node labels

Node label	Description
goatscaler.io/managed:true	Identifies nodes that are managed by the node instant scaling feature. The node instant scaling feature periodically checks whether nodes that have this label meet the scale-in conditions.
k8s.aliyun.com: true	Identifies nodes that are managed by the node instant scaling feature. The node instant scaling feature periodically checks whether nodes that have this label meet the scale-in conditions.
goatscaler.io/provision-task-id:{task-id}	Indicates the ID of a scale-out task run by the node instant scaling feature so that you can trace the source of the nodes that are added to the cluster.

Node taints

Node taint	Description
goatscaler.io/node-terminating	Nodes that have this taint are scaled in by the node instant scaling feature.

Pod annotations

Pod annotation	Description
goatscaler.io/provision-task-id	Indicates the ID of a scale-out task that is created by the node instant scaling feature for the current pod. The node instant scaling feature does not add another node for a pod that has this annotation and waits for the current node to launch.
goatscaler.io/reschedule-deadline	Indicates the time that the node instant scaling feature waits for a pod to be scheduled to a node. If this time is exceeded and the pod is still unschedulable, the node instant scheduling feature re-adds the pod to the scale-out list.

What to do next

View health status of node instant scaling

The node instant scaling feature supports dynamically selecting types and zones based on the inventory status of Elastic Compute Service (ECS) instances. To monitor the health of the instance within a node pool, obtain configuration suggestions for instance optimization, and ensure the execution of node scaling activities, check the ConfigMap of the node pool. This allows you to assess the health status of the node pool inventory, evaluate its inventory, and proactively analyze and adjust instance types.

For more information, see View the health status of node instant scaling.

Update Node Auto Scaling

We recommend that you update Node Auto Scaling at your earliest convenience to use the latest features and optimizations. For more information, see Manage components.