All Products
Search
Document Center

Platform For AI:CreateServiceAutoScaler

Last Updated:Jan 28, 2026

Enables the Autoscaler feature and creates an Autoscaler controller for a service.

Try it now

Try this API in OpenAPI Explorer, no manual signing needed. Successful calls auto-generate SDK code matching your parameters. Download it with built-in credential security for local usage.

Test

RAM authorization

The table below describes the authorization required to call this API. You can define it in a Resource Access Management (RAM) policy. The table's columns are detailed below:

  • Action: The actions can be used in the Action element of RAM permission policy statements to grant permissions to perform the operation.

  • API: The API that you can call to perform the action.

  • Access level: The predefined level of access granted for each API. Valid values: create, list, get, update, and delete.

  • Resource type: The type of the resource that supports authorization to perform the action. It indicates if the action supports resource-level permission. The specified resource must be compatible with the action. Otherwise, the policy will be ineffective.

    • For APIs with resource-level permissions, required resource types are marked with an asterisk (*). Specify the corresponding Alibaba Cloud Resource Name (ARN) in the Resource element of the policy.

    • For APIs without resource-level permissions, it is shown as All Resources. Use an asterisk (*) in the Resource element of the policy.

  • Condition key: The condition keys defined by the service. The key allows for granular control, applying to either actions alone or actions associated with specific resources. In addition to service-specific condition keys, Alibaba Cloud provides a set of common condition keys applicable across all RAM-supported services.

  • Dependent action: The dependent actions required to run the action. To complete the action, the RAM user or the RAM role must have the permissions to perform all dependent actions.

Action

Access level

Resource type

Condition key

Dependent action

eas:CreateServiceAutoScaler

create

*Service

acs:eas:{#regionId}:{#accountId}:service/{#ServiceName}

None None

Request syntax

POST /api/v2/services/{ClusterId}/{ServiceName}/autoscaler HTTP/1.1

Path Parameters

Parameter

Type

Required

Description

Example

ClusterId

string

Yes

The ID of the region where the service is deployed.

cn-shanghai

ServiceName

string

Yes

The service name. For more information about how to query the service name, see ListServices.

foo

Request parameters

Parameter

Type

Required

Description

Example

body

object

No

The request body.

min

integer

Yes

The minimum number of instances in the service.

2

max

integer

Yes

The maximum number of instances in the service. The value of max must be greater than the value of min.

8

scaleStrategies

array<object>

Yes

The service for which the metric is specified. If you do not set this parameter, the current service is specified by default.

object

No

metricName

string

Yes

The name of the metric for triggering auto scaling. Valid values:

  • qps: the queries per second (qps) for an individual instance.

  • cpu: the cpu utilization.

  • gpu[util]: gpu utilization.

qps

threshold

number

Yes

The threshold of the metric that triggers auto scaling.

  • If you set metricName to qps, scale-out is triggered when the average qps for a single instance is greater than this threshold.

  • If you set metricName to cpu, scale-out is triggered when the average cpu utilization for a single instance is greater than this threshold.

  • If you set metricName to gpu, scale-out is triggered when the average gpu utilization for a single instance is greater than this threshold.

10

service

string

No

The service for which the metric is specified. If you do not set this parameter, the current service is specified by default.

demo_svc

behavior

object

No

The Autoscaler operation.

scaleUp

object

No

The scale-out operation.

stabilizationWindowSeconds

integer

No

The time window that is required before the scale-out operation is performed. The scale-out operation can be performed only if the specified metric exceeds the specified threshold in the specified time window. Default value: 0.

0

scaleDown

object

No

The scale-in operation.

stabilizationWindowSeconds

integer

No

The time window that is required before the scale-in operation is performed. The scale-in operation can be performed only if the specified metric drops below the specified threshold in the specified time window. Default value: 300.

300

onZero

object

No

The operation that reduces the number of instances to 0.

scaleDownGracePeriodSeconds

integer

No

The time window that is required before the number of instances is reduced to 0. The number of instances can be reduced to 0 only if no request is available or no traffic exists in the specified time window. Default value: 600.

600

scaleUpActivationReplicas

integer

No

The number of instances that you want to create at a time if the number of instances is 0. Default value: 1.

1

Response elements

Element

Type

Description

Example

object

The response parameters.

RequestId

string

The request ID.

40325405-579C-4D82****

Message

string

The returned message.

Succeed to auto scale service [foo]

Examples

Success response

JSON format

{
  "RequestId": "40325405-579C-4D82****",
  "Message": "Succeed to auto scale service [foo]"
}

Error codes

See Error Codes for a complete list.

Release notes

See Release Notes for a complete list.