This topic provides solutions to some frequently asked questions (FAQs) about node instant scaling in Container Service for Kubernetes (ACK).
Index
Category | Subcategory | Issue |
Scaling behavior of node instant scaling | ||
Custom scaling behavior | How does node instant scaling use pods to control node scale-in activities? | |
Scale-out behavior
What resources can node instant scaling simulate?
The following resources are supported:
cpu
memory
ephemeral-storage
aliyun.com/gpu-mem # Only supports shared GPU
nvidia.com/gpu
Can node instant scaling adjust to the appropriate instance type in the node pool based on requests received by pods?
Yes, it can. For example, if you configure two instance types for a node pool with Auto Scaling enabled: 4 Core 8 GB and 12 Core 48 GB, and the pod requests 2 Core, node instant scaling will prioritize scheduling the pod to the 4 Core 8 GB node during a scaling operation. If the 4 Core 8 GB node is later upgraded to 8 Core 16 GB, node instant scaling will automatically run the pod on the 8 Core 16 GB node.
How does node instant scaling choose by default when multiple instance types are configured in the node pool?
Based on the instance types configured in the node pool, node instant scaling periodically excludes instance types with insufficient inventory. It then sorts the remaining types by the number of CPU cores and checks each one to see if it meets the resource requests of unschedulable pods. Once an instance type meets the requirements, node instant scaling prioritizes it and does not check the remaining types.
How does node instant scaling detect changes in instance type inventory in the node pool?
Node instant scaling offers health metrics that periodically update inventory changes in the Auto Scaling node pool. When the inventory status of an instance type changes, node instant scaling sends a Kubernetes Event named InstanceInventoryStatusChanged. You can subscribe to this event notification to monitor the inventory health of the node pool, assess its status, and analyze or adjust the instance type configuration in advance. For more information, see View the health status of node instant scaling.
How can I optimize node pool configuration to avoid scale-out failures due to insufficient inventory?
Consider the following suggestions to expand the range of instance type options:
Configure multiple optional instance types for the node pool, or use generalized configurations.
Configure multiple zones for the node pool.
Why does node instant scaling fail to add nodes after a scale-out activity is triggered?
Check for the following issues:
Instance types configured in the node pool have insufficient inventory.
The instance types configured in the node pool cannot meet the resource requests from the pods. Some resources provided by the specified Elastic Compute Service (ECS) instance type are reserved or occupied for the following purposes:
Resources are used for virtualization or occupied by the operating system during instance creation. For more information, see Why does a purchased instance have a memory size that differs from the memory size defined in the instance type?
ACK needs to occupy some resources to run Kubernetes components and system processes such as kubelet, kube-proxy, Terway, and the container runtime. For more information, see Resource reservation policy.
By default, system components are installed on each node. Therefore, the requested pod resources must be less than the resource capacity of the instance type.
The Resource Access Management (RAM) role lacks permissions to manage the Kubernetes cluster. For more information, see Enable node instant scaling.
The node pool with Auto Scaling enabled fails to scale out.
To ensure the accuracy of subsequent scaling and the stability of the system, the node instant scaling component does not perform any scaling operations until it resolves issues with abnormal nodes.
Scale-in behavior
Why does node instant scaling fail to remove nodes after a scale-in is triggered?
Check for the following issues:
Only scaling in empty nodes is enabled, but the node being removed is not empty.
The requested resource threshold of each pod is higher than the specified scale-in threshold.
Pods in the kube-system namespace are running on the node.
A scheduling policy forces the pods to run on the current node. Therefore, the pods cannot be scheduled to other nodes.
PodDisruptionBudget is set for the pods on the node and the minimum value of PodDisruptionBudget has been reached.
If there are new nodes, node instant scaling does not perform scale-in operations on the node within 10 minutes.
What types of pods can prevent node instant scaling from removing nodes?
If a pod is not created by a native Kubernetes Controller, such as a Deployment, ReplicaSet, Job, or StatefulSet, or if pods on a node cannot be securely terminated or migrated, the node may not be removed by the node instant scaling.
Use pods to control scaling
How does node instant scaling use pods to control node scale-in activities?
You can use the pod annotation goatscaler.io/safe-to-evict
to specify whether a pod will prevent node instant scaling from scaling in a node.
To prevent the node from being scaled in: Add the annotation
"goatscaler.io/safe-to-evict": "false"
to the pod.To allow the node to be scaled in: Add the annotation
"goatscaler.io/safe-to-evict": "true"
to the pod.
Use nodes to control scaling
How do I specify the nodes that I want to delete during the scale-in activities of node instant scaling?
You can add the taint goatscaler.io/force-to-delete:true:NoSchedule
to the nodes that you want to delete. After you add this taint, node instant scaling will execute the delete operation without checking the pod status or whether the pod has been evicted from the drained node. Use this feature with caution, because it may result in service interruptions or data loss.
How do I prevent node instant scaling from removing nodes?
To prevent node instant scaling from removing nodes, add the annotation "goatscaler.io/scale-down-disabled": "true"
to the node configurations. Then run the following command to add the annotation:
kubectl annotate node <nodename> goatscaler.io/scale-down-disabled=true
Can node instant scaling only scale in empty nodes?
You can configure whether to scale in only empty nodes at the node or cluster level, or both. If both are configured, the node-level setting takes precedence.
Node level: Add the label
goatscaler.io/scale-down-only-empty:true
orgoatscaler.io/scale-down-only-empty:false
to the node to enable or disable this feature, respectively.Cluster level: On the Add-ons page in the Container Service Management Console, find the node instant scaling component, and configure ScaleDownOnlyEmptyNodes as true or false to enable or disable this feature as prompted.
The node instant scaling component
Are there any operations that trigger the automatic update of the node instant scaling component?
No, there are not. Except during system maintenance and upgrades, ACK will not automatically update the node instant scaling component. You need to update them manually on the Add-ons page in the Container Service Management Console.
Why does node scaling still fail after I complete role authorization in the ACK managed cluster?
This issue may be due to the absence of addon.aliyuncsmanagedautoscalerrole.token
in the Secret under the kube-system namespace of the cluster. If this token is missing, use one of the following methods to add it:
Submit a ticket for technical support.
Manually add the permission: By default, ACK assumes the worker RAM role to use the relevant capabilities. Use the following steps to manually assign the AliyunCSManagedAutoScalerRolePolicy permission to the worker role:
On the Clusters page, find the cluster that you want to manage and click its name. In the left-side pane, click Cluster Information.
On the Clusters page, find the cluster that you want to manage and click its name. In the left-side navigation pane, choose .
On the Node Pools page, click Enable next to Node Scaling.
Authorize the
KubernetesWorkerRole
role and theAliyunCSManagedAutoScalerRolePolicy
system policy as prompted. The following figure shows the console page on which you can complete the authorization:To apply the new RAM policy, manually restart the cluster-autoscaler or ack-goatscaler Deployment in the kube-system namespace. The cluster-autoscaler manages node auto scaling, while ack-goatscaler handles node instant scaling.