When your business workloads decrease, Auto Scaling triggers scale-in events in your scaling group. This automates the adjustment of resources and minimizes resource costs. This topic describes how to perform graceful scale-in operations.
Introduction to the scale-in process
When a scale-in process is triggered in your scaling group, Auto Scaling selects instances to remove from the scaling group based on the configured scale-in policy. After the instances are removed, they are reclaimed based on the predefined instance reclaim mode. The configurations vary based on different phases in a scale-in process, as shown in the following figure.
Trigger a scale-in event
Control the scale-in boundary to meet the daily business requirements
Implementation method: Configure the Minimum Number of Instances parameter for the scaling group.
The Minimum Number of Instances parameter specifies the lower limit of the number of instances in a scaling group. When a scale-in request is initiated, Auto Scaling rejects the scale-in request if the number of instances in the scaling group drops below the lower limit after the scale-in process is complete. This prevents the resources in the scaling group from being insufficient and unable to meet the daily business requirements.
Operations: For more information, see Manage scaling groups.
Scale in instances based on the workload tiers (step scaling rule)
Implementation method: Create a step scaling rule for the scaling group.
You can create a step scaling rule to enable scale-in based on the workload tiers. This method effectively prevents system overloads or interruption caused by rapid removal of multiple instances and ensures graceful scale-in events. For example, you want to design a custom scale-in solution based on the following CPU utilization tiers in your scaling group:
Scales in five instances if the average CPU utilization drops below 20%.
Scales in three ECS instances if the average CPU utilization ranges between 20% and 30%.
Scales in one ECS instance if the average CPU utilization ranges between 30% and 50%.
In this case, you can create a step scaling rule, as shown in the following figure.
Operations: For more information, see Manage scaling rules.
Configure a cooldown period and an event-triggered task to control the scale-in rate and frequency
You can configure a cooldown period and an event-triggered task to prevent business instability caused by frequent scale-in operations and ensure graceful scale-in events.
Specify the time when the scale-in event is triggered
You can specify the time when the scale-in event is triggered based on your business requirements. This implements graceful scale-in events. You can use one of the following methods:
Select the instances that you want to scale in
By default, Auto Scaling scales in instances based on the specified order of vSwitches of your scaling group (priority policy). You can modify the scale-in policy to select the instances that you want to scale in based on your business requirements.
If you do not want a mission-critical instance to be scaled in, you can put this instance into the Protected state to prevent business interruption caused by unexpected instance scale-in. For more information, see Manually put instances into the Protected state or move instances out of the Protected state.
Scaling groups of the Elastic Container Instance type do not support the Scale-In Policy and Scaling Policy parameters. By default, Auto Scaling preferentially removes elastic container instances created from the earliest scaling configuration from scaling groups, and then removes the earliest elastic container instances from the scaling groups.
Solution 1: Balance the distribution of instances across zones after a scale-in process is complete
This solution ensures disaster recovery. If you use this solution, instances are evenly distributed across multiple zones after a scale-in process is complete to implement disaster recovery.
Implementation method: Set the Scaling Policy parameter to Balanced Distribution Policy.
After you enable the balanced distribution policy, Auto Scaling preferentially scales in instances from the zone that have the largest number of instances. If you want the scale-in process to continue after the balanced distribution policy takes effect, set the Scale-In Policy parameter to Created From Earliest Scaling Configuration, Earliest Instances, or Most Recent Instances.
Operations: For more information, see Scenario 2: Balanced distribution policy + scale-in policy.
Solution 2: Prioritize the scale-in of instances that have the highest unit price (cost optimization policy)
This solution ensures cost-effectiveness. You can enable the cost optimization policy to scale in instances that have the lowest level of cost-effectiveness. This improves resource utilization.
Implementation method: Set the Scaling Policy policy to Cost Optimization Policy.
After you enable the cost optimization policy, Auto Scaling preferentially scales in instances that have the highest unit price from your scaling group. If you want the scale-in process to continue after the cost optimization policy takes effect, set the Scale-In Policy parameter to Instances Created From Earliest Scaling Configuration, Earliest Instances, or Most Recent Instances.
Operations: For more information, see Scenario 3: Cost optimization policy + scale-in policy.
This solution helps you balance resource costs. You can configure the ratio of preemptible instances to pay-as-you-go instances in your scaling group.
Solution 3: Create a custom combination policy
You can combine Solution 1 and Solution 2.
Implementation method: Set the Scaling Policy parameter for your scaling group to Custom Combination Policy.
When you enable the custom combination policy, you can adjust the ratio of pay-as-you-go instances to preemptible instances, balance the resource capacity across multiple zones, and create capacity planning policies for pay-as-you-go and preemptible instances.
Operations: For more information, see Combine scaling policies and scale-in policies.
Solution 4: Create a custom scale-in policy
If the scale-in policies supported by Auto Scaling cannot meet your business requirements, you can use Function Compute to create a custom scale-in policy, as described in this solution.
Implementation method: Set the Scale-In Policy parameter to Custom Policy.
You can create a custom scale-in policy by using programming languages in Function Compute. Each time a scale-in event is triggered, the function you created in Function Compute is invoked. You can define which instances can be scaled in and which instances cannot when you create the function based on your business requirements.
Operations: For more information, see Use Function Compute to create custom scale-in policies for ECS instances.
Gracefully scale in instances
A scale-in process proceeds only if an instance that meets the scale-in standards has completed its ongoing task. This process, known as graceful scale-in, prevents business interruptions due to the scale-in operation.
Implementation method: Create a lifecycle hook.
When a scale-in process is triggered, you can enable a lifecycle hook to put instances that have ongoing tasks into the Pending Remove state. During the effective period of the lifecycle hook, you can perform operations on the instances. If a longer period of time is required to complete the ongoing tasks, you can call an API operation to extend the effective period of the lifecycle hook.
Operations: For more information, see Overview and RecordLifecycleActionHeartbeat.
Scaling groups of the Elastic Container Instance type do not support the lifecycle hook feature. If you use a scaling group of the Elastic Container Instance type, you cannot use this solution.
If you directly remove, delete, or stop instances for a similar scale-in effect, no lifecycle hook takes effect. You cannot use this solution.
Reclaim instances that are scaled in
To improve the scale-in efficiency, the default instance reclaim mode is Forcibly Release. In this mode, Auto Scaling directly releases the instances that are removed from scaling groups. No resource is retained after instances are released. You can also use other instance reclaim modes. For more information, see Manage scaling groups.