All Products
Search
Document Center

Container Service for Kubernetes:Auto scaling overview

Last Updated:Sep 14, 2024

Auto scaling is a management service that dynamically scales computing resources to meet your business requirements. This service is suitable for online workloads, large-scale computing and training tasks, GPU-accelerated deep learning tasks, and model inference and model training tasks that use shared GPUs. This topic describes auto scaling solutions supported by ACK Serverless clusters.

Solution

Overview

Scaling metrics

Scenarios

Supported resource types

References

HPA

For Kubernetes, Horizontal Pod Autoscaler (HPA) is the most commonly used solution for automatically scaling pods. HPA can rapidly scale out replicated pods to handle heavy stress when the workloads surge and scale in appropriately to save resources when demand subsides.

Businesses with large fluctuations in service demands, numerous services, and frequent scaling requirements, such as e-commerce, online education, and financial services.

Objects such as Deployments and StatefulSets that are compatible with the scale interface.

Horizontal pod autoscaling

CronHPA

Cron Horizontal Pod Autoscaler (CronHPA) scales pods in the cluster based on crontab-like schedules. It supports the configuration of time zones, execution dates, and exclusion dates (such as holidays), and can work in coordination with the HPA.

Scheduled scaling.

Business traffic has significant peak periods, or applications need to perform tasks at specific times.

Resources such as Deployments and StatefulSets.

Use CronHPA for scheduled horizontal scaling

VPA

Vertical Pod Autoscaler (VPA) monitors pod resource usage, offers flexible CPU and memory allocation recommendations, and automatically adjusts these settings as needed without changing the number of pod replicas.

VPA recommends and automatically adjusts the requests and limits for CPU and memory resources in the containers of the pod.

Stateful applications or monolithic applications that require stable resource supply. Typically, the VPA is used when pods are recovered from anomalies.

Resources such as Deployments, DaemonSets, and StatefulSets.

Vertical pod autoscaling

AHPA

Advanced Horizontal Pod Autoscaler (AHPA) automatically identifies workload fluctuations and predicts resource demand based on historical metric data to help you implement predictive scaling.

  • Resource metrics such as CPU, memory, and GPU usage.

  • Traffic metrics such as queries per second (QPS) and response time (RT).

  • Other custom metrics.

Applications whose workloads periodically fluctuate, such as live streaming, online education, and gaming applications.

Resources such as Deployments and Knative Services.

AHPA overview