horizontal auto scaling - Platform For AI - Alibaba Cloud Documentation Center

If your workloads fluctuate and have distinct peaks and offpeaks, you can enable auto scaling to prevent resource waste. After you enable this feature, Elastic Algorithm Service (EAS) automatically manage the computing resources of online services by adjusting the number of service instances. This ensures the stability of your business and improves resource utilization. This topic describes how to configure auto scaling and calculate the number of service instances during auto scaling.

Background information

You can configure auto scaling by using one of the following methods: Method 1: Manage the auto scaling feature in the PAI console and Method 2: Manage the auto scaling feature by using a client. For more information about how to calculate the number of instances after you enable auto scaling, see the "Auto scaling policies" section of this topic.

Method 1: Manage the auto scaling feature in the PAI console

Enable auto scaling

Log on to the PAI console.
In the left-side navigation pane, click Workspaces. On the Workspaces page, click the name of the workspace that you want to manage.
In the left-side navigation pane of the workspace page, choose Model Deployment > Elastic Algorithm Service (EAS). The Elastic Algorithm Service (EAS) page appears.
Use one of the following methods to open the Auto Scaling Settings dialog box
- Method 1:
  1. On the EAS-Online Model Services page, click the name of the service that you want to manage to go to the Service Details page.
  2. Click the Auto Scaling tab and then click Enable Auto Scaling in the Auto Scaling section.
- Method 2: On the EAS-Online Model Services page, find the service that you want to manage and choose > Auto Scaling in the Actions column. The Auto Scaling dialog box appears.

In the Auto Scaling Settings dialog box, configure the parameters.

Basic Settings

Parameter	Description
Minimum Number of Instances	The minimum number of instances that can be occupied by a model service. The value of this parameter must be greater than or equal to 0.
Maximum Number of Instances	The maximum number of instances that can be occupied by a model service. The value of this parameter must be less than or equal to 3000.
General Scaling Metrics	Select a general scaling metric from the drop-down list and specify the scaling threshold. Valid values: QPS Threshold of Individual Instance: the queries per second (QPS) threshold. If the average QPS of a single instance is greater than the threshold, a scale-out is triggered for the instance. CPU Utilization Threshold: the CPU utilization threshold. If the average CPU utilization of a single instance is greater than the threshold, a scale-out is triggered for the instance. Queue Size for Asynchronous Requests: This parameter is applicable only to asynchronous services. If the number of requests in the queue is greater than the threshold, a scale-out is triggered. GPU Usage Threshold: the GPU utilization threshold. If the average GPU utilization of a single instance is greater than the threshold, a scale-out is triggered for the instance.
Custom Scaling Metric	You can configure custom scaling metrics and the corresponding scaling thresholds.

Advanced Settings

Parameter	Description
Scale-out Starts in	The duration from the time when the scale-out is triggered to the time when the scale-out starts. If the system detects that the number of requests decreases during the specified period of time, the system automatically cancels the scale-out. Default value: 1. Unit: seconds.
Scale-in Starts in	The duration from the time when the scale-in is triggered to the time when the scale-in starts. If the system detects that the number of requests increases during the specified period of time, the system automatically cancels the scale-in. Default value: 300. Unit: seconds.
Scale-in to 0 Instance Starts in	The duration from the time when the scale-in is triggered to the time when all instances are removed.
Number of Instances to Add from 0	The duration from the time when the system starts to add instances from 0 to the time when the specified number of instances are added.

Click Enable.

Update the configurations of an auto scaling task

Use one of the following methods to open the Auto Scaling Settings dialog box.
- Method 1: In the Auto Scaling section of the Auto Scaling tab, click Update.
- Method 2: On the EAS-Online Model Services page, find the service that you want to manage and choose > Auto Scaling in the Actions column. The Auto Scaling dialog box appears.
In the Auto Scaling Settings dialog box, modify the configurations.
Click Update.

Disable auto scaling

In the Auto Scaling section of the Auto Scaling tab, click Disable Auto Scaling.
In the confirmation dialog box that appears, click OK.

Method 2: Manage the auto scaling feature by using a client

Enable auto scaling or update the auto scaling policy

By default, the auto scaling feature is disabled after you create a service. You can log on to the EASCMD client and run an autoscale subcommand to enable this feature. For more information about how to log on to the EASCMD client, see Download the EASCMD client and complete user authentication. You can use one of the following methods to enable auto scaling or update the scaling policy:

Configure parameters (recommended)

Command syntax

eascmd autoscale [region]/[service_name] -D[attr_name]=[attr_value]

Example

eascmd autoscale cn-shanghai/test_autoscaler -Dmin=2 -Dmax=5 -Dstrategies.qps=10

Configure the description file

Command syntax

eascmd autoscale [region]/[service_name] -s [desc_json]

You can configure the scaling policy in the desc_json file. The following sample code provides an example on how to configure the desc_json file:

{
    "min": 2,
    "max": 5,
    "strategies": {
        "qps": 10
    }
}

Parameter	Description
min	The minimum number of instances. The value must be greater than 0. Note Even if the required number of instances that is calculated based on the scaling metrics is less than the minimum number of instances, the number of instances cannot be less than the minimum number of instances.
max	The maximum number of instances. The value can be up to 300. Note Even if the required number of instances that is calculated based on the scaling metrics is greater than the maximum number of instances, the number of instances cannot be greater than the maximum number of instances.
strategies	You can use the qps and cpu parameters to configure auto scaling policies based on your business requirements. qps: the QPS threshold that is used to trigger a scale-out for a single instance. If the average QPS of a single instance is greater than the threshold, a scale-out is automatically triggered for the instance. cpu: the CPU utilization threshold that is used to trigger a scale-out for a single instance. Valid values: 0 to 100. If the average CPU utilization of a single instance is greater than the threshold, a scale-out is automatically triggered for the instance.

Example

eascmd autoscale cn-shanghai/test_autoscaler -s scaler.json

Disable auto scaling

Command syntax

eascmd autoscale rm [region]/[service_name]

Example

eascmd autoscale rm cn-shanghai/test_autoscaler

Auto scaling policies

You can use the following formula to calculate the number of instances during an auto scaling operation:
```
desiredReplicas = ceil[currentReplicas * ( currentMetricValue / desiredMetricValue )]
```
The following table describes the parameters in the preceding code.
- desiredReplicas: the desired number of instances.
- currentReplicas: the current number of instances.
- currentMetricValue: the current average value of the scaling metric.
- desiredMetricValue: the desired average value of the scaling metric.
Example
In this example, the QPS metric is used. If the tested or estimated QPS of a single instance is 10 when you deploy a service, set the QPS threshold (strategies.qps) to 10. The number of instances is 2, and the average QPS of each instance is increased to 23. The desired number of instances is calculated by using the following formula: ceil[2 × (23/10)] = 5.
If the total QPS is decreased to 10, the average QPS of each instance is 2. The desired number of instances is calculated by using the following formula: ceil[5 × (2/10)] = 1. In this case, the number of instances is gradually reduced to 1. The system smoothly performs the scale-in operation to prevent anomalies caused by traffic fluctuations.

References

For information about how to configure scheduled scaling, see Scheduled scaling.
For information about how to configure resources in a flexible manner to meet business requirements, see Elastic resource pool.
For information about how to configure custom metrics to obtain the resource utilization after an EAS service performs auto scaling, see Configure a custom monitoring and scaling metric.