Elastic Algorithm Service (EAS) of Platform for AI (PAI) provides the elastic resource pool feature. This feature allows you to scale out a service that is deployed in a dedicated resource group even if the node resources of the resource group are insufficient. If node resources in the dedicated resource group are insufficient during a service scale-out, new instances of the service are created in the pay-as-you-go public resource group and billed based on the rules of the public resource group. During a service scale-in, service instances that reside in the public resource group are released first.
Prerequisites
A dedicated resource group is created. For more information, see Work with dedicated resource groups.
Background information
You can create subscription or pay-as-you-go instances for an EAS dedicated resource group. This way, you can purchase sufficient resources in a cost-effective manner to reduce costs.
During actual use, you may want services in the dedicated resource groups to be scalable. For example, during peak hours, you may require more pay-as-you-go resources that can be automatically scaled in during off-peak hours. To cater to this need, EAS provides the automatic horizontal scaling feature that allows service instances to be automatically added to and removed from a service. However, for a dedicated resource group, the maximum number of service instances for a service is limited by the node resources of the resource group, and manually adding or removing node resources is inefficient and inconvenient. To eliminate this limit, EAS allows service instances to be created in the public service group during a horizontal scaling.
Benefits
You can use the elastic resource pool feature together with horizontal auto scaling to automate the scaling of services deployed in dedicated resource groups based on metrics such as, queries per second (QPS) and CPU utilization without being limited by the node resources in the resource groups. In this case, your services are billed based on the subscription and pay-as-you-go billing methods to reduce costs.
Procedure
Enable auto scaling for a service when you deploy the service
Enable resource auto scaling in the console
Go to the Deploy Service page. For more information, see Model service deployment by using the PAI console.
In the Resource Deployment Information section of the Deploy Service page, configure the required parameters. The following table describes key parameters. For more information about other parameters, see Model service deployment by using the PAI console.
Parameter
Description
Resource Group Type
Select an existing dedicated resource group.
Elastic Resource Pool
Turn on Elastic Resource Pool to enable the elastic resource pool feature for the services that are deployed in the dedicated resource group.
Configure the instance type. For more information, see the Resource Configuration Mode section in the Model service deployment by using the PAI console topic.
If you turn on Elastic Resource Pool and the dedicated resource group that you use to deploy services is fully occupied, the system automatically adds pay-as-you-go instances to the public resource group during scale-outs. The added instances are billed as public resources. The instances in the public resource group are preferentially released during scale-ins.
Click Deploy.
Enable resource auto scaling in the client
You can enable auto-scaling for a service when you deploy the service by using the EASCMD client. The following section describes the procedure by using a Windows 64 server as an example.
Configure a JSON file.
ImportantThe configuration methods of resources and the virtual private cloud (VPC) direct connection feature vary based on the resource group type of the service. In the public resource group, you can use the cloud.computing parameter to specify the required node type and obtain more resources for a service. You can also use the cloud.networking parameter to enable the VPC direct connection feature for the service. If a service is deployed in a dedicated resource group, you can enable the VPC direct connection feature only for the dedicated resource group. If you deploy a service in a dedicated resource group and enable the elastic resource pool feature for the service, you must configure the cloud.networking parameter to ensure the availability of VPC direct connections during service scaling.
The following code provides sample content of the JSON file:
{ "model_path": "http://examplebucket.oss-cn-shanghai.aliyuncs.com/models/lr.pmml", "name": "test_burstable_service", "processor": "pmml", "metadata": { "instance": 1, "cpu": 1, "resource": "eas-r-xxx", "resource_burstable": true }, "cloud": { "computing": { "instance_type": "ecs.r7.2xlarge" }, "networking": { "security_group_id": "sg-uf68iou5an8j7sxd****", "vswitch_id": "vsw-uf6nji7pzztuoe9i7****" } } }
In the preceding code:
resource_burstable: Specifies whether auto scaling is enabled for the service. If you set this parameter to true, auto scaling is enabled for the service.
cloud.networking: This parameter does not take effect for services that are deployed in dedicated resource groups. If you enable the elastic resource group feature for the service, you must configure this parameter to ensure the availability of VPC direct connections during service scaling.
cloud.computing: Optional. You can specify the required node type in the public resource group during a scale-out. For more information, see Work with the public resource group.
For information about other parameters, see Parameters of model services.
Deploy the service by using the EASCMD client. For more information, see Deploy model services by using EASCMD or DSW.
If a dedicated resource group is insufficient to support a service scale-out, the service instances added to the service use the public resource group.
Enable or disable auto-scaling for a service after the service is deployed
Use the console
Go to the Elastic Algorithm Service (EAS) page. For more information, see Model service deployment by using the PAI console.
Click Update Service in the Actions column of the service.
In the Resource Deployment Information section of the Deploy Service page, enable or disable the resource auto scaling feature.
Enable resource auto scaling
In the Resource Deployment Information section, turn on Elastic Resource Pool and configure the instance type of the public resource group.
Disable resource auto scaling
In the Resource Deployment Information section, turn off Elastic Resource Pool.
Click Deploy.
Use the EASCMD client
You can run the following commands to enable and disable the elastic resource pool feature for a deployed service. In the following section, a Windows 64 server is used as an example.
If you did not configure the cloud.networking parameter when you deployed a service in a dedicated resource group and you enabled the elastic resource pool feature for the service, VPC direct connections are unavailable for the new service instances that are added to the public resource group.
# Enable the elastic resource pool feature for a deployed service.
eascmdwin64.exe modify <service_name> -Dmetadata.resource_burstable=true
# Disable the elastic resource pool feature for a deployed service.
eascmdwin64.exe modify <service_name> -Dmetadata.resource_burstable=false
Replace <service_name> with the name of the service that you want to manage.
The elastic resource pool feature takes effect only for new service instances. For example, if a service is scaled out and has two pending service instances before you enable the elastic resource pool feature for the service, the two instances are not automatically migrated to the public resource group after you enable the elastic resource pool feature. When you restart the two instances in the PAI console, the instances are scheduled to the public resource group. If specific service instances are scheduled to the public resource group after the elastic resource pool feature is enabled for a service, the service instances are not automatically scheduled back to the dedicated resource group after the feature is disabled.
References
You can enable the horizontal auto scaling feature to allow the system to automatically scale instances based on the metrics that you specify. For more information, see Enable or disable the horizontal auto scaling feature.
For information about how to automatically scale the number of instances to a specific number, see Scheduled scaling.