Predictive Scaling for Kubernetes Containers with AHPA

This article discusses the motivation behind the development of the Advanced Horizontal Pod Autoscaler component of Alibaba Cloud's Container Service for Kubernetes.

By Yuanyi and Zibai

Introduction

In the era of cloud-native containers, users need to face a large number of business scenarios, including periodic businesses and serverless on-demand use. When using automatic elasticity, you will inevitably encounter several problems, among which the most significant ones are elastic lag and cold start. These challenges have inspired the Alibaba Cloud-Native team and the Decision Intelligent Timing Team of the Alibaba DAMO Academy to jointly develop the Advanced Horizontal Pod Autoscaler (AHPA) elasticity prediction component. The main starting point of this solution is to make "timing planning" based on the detected cycle and realize the purpose of early expansion through planning so that you can use it on demand while ensuring business stability.

Background

The expectations for cloud-native elasticity has been increasing from two aspects. One is the rise of cloud-native concepts; from the VM era to the container era, cloud usage patterns have changed. The second is the rise of new business models, which are built on the cloud at the beginning of their design, and naturally have a demand for elasticity.

With the cloud, users no longer need to build infrastructure from physical servers and data centers. The cloud provides users with flexible infrastructure. The biggest advantage of the cloud is that it can provide users with flexible resource supply, especially in the cloud-native era when the demand for elasticity from users is getting stronger. The strength of elastic demand is still at the minute-level of manual operation in the VM era. In the container era, the requirements are now within the second-level. Users are facing different business scenarios, and their expectations and requirements for the cloud are changing:

Periodic Business Scenarios: New types of businesses, such as live streaming, online education, and games, have one thing in common: periodicity, which prompts customers to think about elastic business architectures. In addition, when it comes to the concept of cloud-native, it is natural to think of spinning up a batch of services on demand and releasing them when finishing use.
The Arrival of Serverless: The core concept of serverless is on-demand use and automatic elasticity. Users do not need capacity planning. However, when you start using serverless, you will encounter some problems, among which the most significant ones are elastic lag and cold start. This is unacceptable for services that are sensitive to response latency.

So can the existing elastic scheme in Kubernetes solve the problems in preceding scenarios?

Problems Faced by Traditional Elastic Solutions

Generally, there are three ways to manage the number of application instances in Kubernetes: fixed number of instances, HPA, and CronHPA. The most common approach is using a fixed number of instances. The biggest problem with this is that it causes resource waste during off-peak hours of the business. To solve the problem of resource waste, there is HPA, but the elastic trigger of HPA is lagging, which leads to resource supply lag. This may result in the decline of business stability. CronHPA can be scaled regularly, which seems to solve the problem of elastic lag, but we need to think about how fine the specific timing granularity is, and is there a need to manually adjust the timing elastic policy when the traffic volume changes? If you do this, it will bring heavier O&M complexity and potentially more errors.

AHPA Elasticity Prediction

The main starting point of AHPA (Advanced Horizontal Pod Autoscaler) elasticity prediction is to make "timing planning" based on the detected period and realize the purpose of advanced expansion through planning. However, since it is planning, there will be omissions. You need to have the ability to adjust the number of instances planned in real-time. This scenario has two elastic strategies: active prediction and passive prediction. The active prediction uses the RobustPeriod algorithm of DAMO academy [1] to identify the cycle length and then uses the RobustSTL algorithm [2] to generate periodic trends to proactively predict the number of instances to be applied in the next cycle. Passive prediction sets the number of instances based on real-time data of applications to cope with bursts of traffic. In addition, AHPA adds a bottom protection policy that users can set the upper and lower bounds of the number of instances. The number of instances that finally take effect in the AHPA algorithm is the maximum in active prediction, passive prediction, and bottom-up strategies.

Architecture

Elasticity is first carried out under the condition of a stable business. The core purpose of Auto Scaling is not only to help users save costs but also to enhance the overall stability of the business, O&M-free, and build the core competitiveness. The basic principles of AHPA architecture design include:

Stability: Autoscaling with stable user services.
O&M Free: Doesn not add additional O&M burden to users, including no new controllers added to the user side and clearer configuration semantics of Autoscaler than HPA.
Serverless-Oriented: It is an application-centric and application-oriented design. Users do not need to care about the configuration of the number of instances and use them as needed with automatic elasticity.

The following figure shows the architecture:

Rich Data Metrics: Supports CPU, memory, QPS, RT, and external metrics.
Stability Assurance: The elastic logic of AHPA is based on the strategy of active warm-up and passive bottom-up, combined with degradation protection, to ensure resource stability.
- Active prediction: Predicts the trend results for a while in the future based on history. It is suitable for periodic applications.
- Passive prediction: Real-time prediction. Resources are prepared in real-time through passive prediction for bursty traffic scenarios.
- Degradation protection: Allows you to configure multiple instances with the largest and smallest time intervals.
Multiple Scaling Methods: AHPA supports Knative, HPA, and Deployment:
- Knative: Solves the problem of an elastic cold start based on parallelism /QPS/RT in Serverless application scenarios.
- HPA: Simplifies the configuration of HPA elasticity policies, lowers the threshold for users to use elasticity, and solves the problem of cold start when using HPA.
- Deployment: Direct use of deployments and automatic scaling.

Applicable Scenarios

AHPA is ideal for scenarios including:

Periodic scenarios, such as live streaming, online education, and game service scenarios.
Fixed number of instances + elastic bottom, such as dealing with burst traffic under normal business.
Recommended instance number configuration scenarios.

Effect of Prediction

After AHPA elasticity is enabled, we provide a visualization page to view AHPA effects. Here is an example of a prediction based on CPU metrics (compared to using HPA):

Instructions:

Predict CPU Observer: The actual CPU usage of HPA is represented by a blue curve. The CPU usage predicted is represented by a green curve. The green curve is larger than the blue, indicating that the capacity given by prediction is sufficient.

Predict POD Observer: The actual number of Pods that are provisioned by HPA is represented by a blue curve. The number of Pods that are predicted by AHPA is represented by a green curve. The predicted number of Pods is less than the actual number of Pods.
Periodic: The application is detected to be periodic by a prediction algorithm based on the seven-day historical data.

The results show that AHPA can use predictive scaling to handle fluctuating workloads as expected.

Summary

To learn more, visit the documentation of Alibaba Cloud Container Service for AHPA elastic prediction at this link: https://www.alibabacloud.com/help/en/container-service-for-kubernetes/latest/ahpa

References

1 Qingsong Wen, Kai He, Liang Sun, Yingying Zhang, Min Ke, and Huan Xu. RobustPeriod: Robust Time-Frequency Mining for Multiple Periodicity Detection, in Proc. of 2021 ACM SIGMOD/PODS International Conference on Management of Data (SIGMOD 2021), Xi'an, China, Jun. 2021.

2 Qingsong Wen, Jingkun Gao, Xiaomin Song, Liang Sun, Huan Xu, Shenghuo Zhu. RobustSTL: A Robust Seasonal-Trend Decomposition Algorithm for Long Time Series, in Proc. of the 33rd AAAI Conference on Artificial Intelligence (AAAI 2019), 2019, pp. 5409-5416, Honolulu, Hawaii, Jan 2019.

3 Qingsong Wen, Zhe Zhang, Yan Li, and Liang Sun. Fast RobustSTL: Efficient and Robust Seasonal-Trend Decomposition for Time Series with Complex Patterns, in Proc. of the 26th ACM SIGKDD International Conference on Knowledge Discovery & Data Mining (KDD 2020), San Diego, CA, Aug. 2020.

Community

Predictive Scaling for Kubernetes Containers with AHPA

Introduction

Background

Problems Faced by Traditional Elastic Solutions

AHPA Elasticity Prediction

Architecture

Applicable Scenarios

Effect of Prediction

Summary

References

Read previous post:

Read next post:

Alibaba Cloud Native Community

You may also like

Comments

Alibaba Cloud Native Community

Related Products

ECS(Elastic Compute Service)

Elastic High Performance Computing Solution

Elastic High Performance Computing

Function Compute