configure auto scaling of an EAS dedicated resource group - Platform For AI

Elastic Algorithm Service (EAS) of Platform for AI (PAI) provides the elastic resource pool feature. This feature allows you to scale out a service that is deployed in a dedicated resource group even if the node resources of the resource group are insufficient. If node resources in the dedicated resource group are insufficient during a service scale-out, new instances of the service are created in the pay-as-you-go public resource group and billed based on the rules of the public resource group. During a service scale-in, service instances that reside in the public resource group are released first.

Prerequisites

A dedicated resource group is created. For more information, see Work with dedicated resource groups.

Background information

You can create subscription or pay-as-you-go instances for an EAS dedicated resource group. This way, you can purchase sufficient resources in a cost-effective manner to reduce costs.

During actual use, you may want services in the dedicated resource groups to be scalable. For example, during peak hours, you may require more pay-as-you-go resources that can be automatically scaled in during off-peak hours. To cater to this need, EAS provides the automatic horizontal scaling feature that allows service instances to be automatically added to and removed from a service. However, for a dedicated resource group, the maximum number of service instances for a service is limited by the node resources of the resource group, and manually adding or removing node resources is inefficient and inconvenient. To eliminate this limit, EAS allows service instances to be created in the public service group during a horizontal scaling.

Benefits

You can use the elastic resource pool feature together with horizontal auto scaling to automate the scaling of services deployed in dedicated resource groups based on metrics such as, queries per second (QPS) and CPU utilization without being limited by the node resources in the resource groups. In this case, your services are billed based on the subscription and pay-as-you-go billing methods to reduce costs.

Procedure

Enable auto scaling for a service when you deploy the service

Use the console

Log on to the PAI console. Select a region on the top of the page. Then, select the desired workspace and click Enter Elastic Algorithm Service (EAS).
Click Deploy Service. In the Custom Model Deployment section, click Custom Deployment.

In the Resource Deployment section of the Custom Deployment page, configure the required parameters. The following table describes key parameters. For more information about other parameters, see Model service deployment by using the PAI console.

Parameter	Description
Resource Type	Select EAS Resource Group.
Resource Group	Select a created dedicated resource group.
Elastic Resource Pool	Turn on Elastic Resource Pool and select a public resource group for Resource Type to enable the elastic resource pool feature for the services that are deployed in the dedicated resource group. If you turn on Elastic Resource Pool and the dedicated resource group that you use to deploy services is fully occupied, the system automatically adds pay-as-you-go instances to the public resource group during scale-outs. The added instances are billed as public resources. The instances in the public resource group are preferentially released during scale-ins.

Click Deploy.

Use the client

You can enable auto-scaling for a service when you deploy the service by using the EASCMD client. The following section describes the procedure by using a Windows 64 server as an example.

Configure a JSON file.
Important
The configuration methods of resources and the virtual private cloud (VPC) direct connection feature vary based on the resource group type of the service. In the public resource group, you can use the cloud.computing parameter to specify the required node type and obtain more resources for a service. You can also use the cloud.networking parameter to enable the VPC direct connection feature for the service. If a service is deployed in a dedicated resource group, you can enable the VPC direct connection feature only for the dedicated resource group. If you deploy a service in a dedicated resource group and enable the elastic resource pool feature for the service, you must configure the cloud.networking parameter to ensure the availability of VPC direct connections during service scaling.
The following code provides sample content of the JSON file:
```
{
  "model_path": "http://examplebucket.oss-cn-shanghai.aliyuncs.com/models/lr.pmml",
  "name": "test_burstable_service",
  "processor": "pmml",
  "metadata": {
    "instance": 1,
    "cpu": 1,
    "resource": "eas-r-xxx",
    "resource_burstable": true
  },
   "cloud": {
        "computing": {
            "instance_type": "ecs.r7.2xlarge"
        },
        "networking": {
            "security_group_id": "sg-uf68iou5an8j7sxd****",
            "vswitch_id": "vsw-uf6nji7pzztuoe9i7****"
        }
    }
}
```
In the preceding code:
- resource_burstable: Specifies whether auto scaling is enabled for the service. If you set this parameter to true, auto scaling is enabled for the service.
- cloud.networking: This parameter does not take effect for services that are deployed in dedicated resource groups. If you enable the elastic resource group feature for the service, you must configure this parameter to ensure the availability of VPC direct connections during service scaling.
- cloud.computing: Optional. You can specify the required node type in the public resource group during a scale-out. For more information, see Work with the public resource group.
For information about other parameters, see Parameters of model services.
Deploy the service by using the EASCMD client. For more information, see Deploy model services by using EASCMD or DSW.
If a dedicated resource group is insufficient to support a service scale-out, the service instances added to the service use the public resource group.

Enable or disable auto-scaling for a service after the service is deployed

Use the console

Log on to the PAI console. Select a region on the top of the page. Then, select the desired workspace and click Enter Elastic Algorithm Service (EAS).
Click Update in the Actions column of the service.
In the Resource Deployment section of the Update Service page, enable or disable the resource auto scaling feature.
- Enable resource auto scaling
  In the Resource Deployment section, turn on Elastic Resource Pool and configure the resource type of the public resource group.
- Disable resource auto scaling
  In the Resource Deployment section, turn off Elastic Resource Pool.
Click Update.

Use the client

You can run the following commands to enable and disable the elastic resource pool feature for a deployed service. In the following section, a Windows 64 server is used as an example.

Important

If you did not configure the cloud.networking parameter when you deployed a service in a dedicated resource group and you enabled the elastic resource pool feature for the service, VPC direct connections are unavailable for the new service instances that are added to the public resource group.

# Enable the elastic resource pool feature for a deployed service. 
eascmdwin64.exe modify <service_name> -Dmetadata.resource_burstable=true

# Disable the elastic resource pool feature for a deployed service. 
eascmdwin64.exe modify <service_name> -Dmetadata.resource_burstable=false

Replace <service_name> with the name of the service that you want to manage.

Important

The elastic resource pool feature takes effect only for new service instances. For example, if a service is scaled out and has two pending service instances before you enable the elastic resource pool feature for the service, the two instances are not automatically migrated to the public resource group after you enable the elastic resource pool feature. When you restart the two instances in the PAI console, the instances are scheduled to the public resource group. If specific service instances are scheduled to the public resource group after the elastic resource pool feature is enabled for a service, the service instances are not automatically scheduled back to the dedicated resource group after the feature is disabled.

References

You can enable the horizontal auto scaling feature to allow the system to automatically scale instances based on the metrics that you specify. For more information, see Enable or disable the horizontal auto scaling feature.
For information about how to automatically scale the number of instances to a specific number, see Scheduled scaling.