Parameters of model services - Platform For AI - Alibaba Cloud Documentation Center

You can use the EASCMD client to create services in Elastic Algorithm Service (EAS) of Platform for AI (PAI). Before you create a service, you must specify the related parameters in a JSON object. This topic describes the parameters in the JSON object.

Note

For more information about how to use the EASCMD client, see Download the EASCMD client and complete identity authentication.

Parameters

The following tables describe the parameters in the JSON object.

Parameter	Required	Description
name	Yes	The name of the service. The name must be unique in a region.
token	No	The authentication token. If you do not specify this parameter, the system automatically generates a token.
model_path	Yes	The path of the input model package. Specify the model_path and the processor_path parameters in one of the following formats: HTTP URL: The input package must be in the TAR.GZ, TAR, BZ2, or ZIP format. OSS path: You can specify the path of a specific object or directory in Object Storage Service (OSS). If you use an OSS path in another region, you must also specify the oss_endpoint parameter. Example: `"model_path":"oss://wowei-beijing-tiyan/alink/", "oss_endpoint":"oss-cn-beijing.aliyuncs.com",` On-premises path: If you want to run the `test` command to perform debugging on your device, you can use an on-premises path.
oss_endpoint	No	The public endpoint of OSS in a region. Example: oss-cn-beijing.aliyuncs.com. For more information, see Regions and endpoints. Note If you do not specify this parameter, the system uses the endpoint of OSS in the current region to download the model package or processor package. If you want to access OSS in a different region, you must specify this parameter. For example, if you want to deploy a service in the China (Hangzhou) region but want to use an OSS endpoint in the China (Beijing) region for the model_path parameter, you must specify this parameter.
model_entry	No	The entry file of the model package. The file can be an arbitrary file. If you do not specify this parameter, the value of the model_path parameter is used. The main file path in the model package is passed to the initialize() function of the processor.
model_config	No	The configuration of the model. The value is of the TEXT type. The value of this parameter is passed to the second parameter of the initialize() function in the processor.
processor	No	If you use a built-in processor, specify the processor code for this parameter. For information about the processor codes that are used by the `EASCMD` client, see Built-in processors. If you use a custom processor, you do not need to specify this parameter. Instead, you need to specify the processor_path, processor_entry, processor_mainclass, and processor_type parameters.
processor_path	No	The path of the processor package. For more information, see the description of the model_path parameter.
processor_entry	No	The main file of the processor package. such as libprocessor.so, or app.py. The main file contains the implementations of the `initialize()` and `process()` functions that are required for prediction. If you set the processor_type parameter to cpp or python, you must specify this parameter.
processor_mainclass	No	The main class in the JAR package of the processor. such as com.aliyun.TestProcessor. If you set the processor_type parameter to java, you must specify this parameter.
processor_type	No	The language that is used to implement the processor. Valid values: cpp java python
warm_up_data_path	No	The path of the request file that is used for model warm-up. For more information, see Warm up model services .
runtime.enable_crash_block	No	Specifies whether the service instance automatically restarts if the instance crashes due to a processor code exception. Valid values: true: The service instance does not automatically restart. This helps you troubleshoot issues. false: The service instance automatically restarts. This is the default value.
cloud	No	If you use the public resource group to deploy a service, you must use the cloud.computing.instance_type parameter to specify the instance type that is used to deploy the service. `"cloud":{ "computing":{ "instance_type":"ecs.gn6i-c24g1.6xlarge" } }` For more information, see Usage notes for the shared resource group.
autoscaler	No	The horizontal auto-scaling configuration of the model service. For more information, see Auto-scaling.
containers	No	The container information of the custom image that you want to use to deploy the service. For more information, see Deploy a model service by using a custom image.
storage	No	The storage information of the service.
metadata	Yes	The metadata of the service. For more information, see the Table 1. metadata parameters section of this topic.
features	No	The features of the service. For more information, see the Table 2. features parameters section of this topic.
networking	No	The call configurations of the service. For more information, see the Table 3. networking parameters section of this topic.

Table 1. metadata parameters

Parameter		Required	Description
Standard parameters	instance	Yes	The number of service instances.
	workspace_id	No	After configuring the workspace ID, you can use the service only within the specified PAI workspace. Example: `1405**`.
	cpu	No	The number of CPUs required by each instance.
	memory	No	The amount of memory required by each instance. The value must be an integer. Unit: MB. For example, `"memory": 4096` specifies that each instance requires 4 GB of memory.
	gpu	No	The number of GPUs required by each instance.
	gpu_memory	No	The amount of GPU memory required by each instance. The value must be an integer. Unit: GB. PAI allows memory resources of one GPU to be allocated to multiple instances. If you want multiple instances to share the memory resources of one GPU, set the gpu parameter to 0. If you set the gpu parameter to 1, each instance occupies a GPU and the gpu_memory parameter does not take effect. Important PAI does not enable the strict isolation of GPU memory. To prevent out-of-memory (OOM) errors, make sure that the GPU memory used by each instance does not exceed the requested amount.
	gpu_core_percentage	No	The ratio of the computing power required per GPU by each instance. The value is an integer between 1 and 100. Unit: percentage. For example, if you set the parameter to 10, the system uses 10% computing power of each GPU. This facilitates flexible scheduling of computing power and allows multiple instances to share a single GPU. You must specify the gpu_memory parameter for the gpu_core_procentage parameter to take effect.
	qos	No	The quality of service (QoS) level of each instance. You can leave this parameter empty or set this parameter to BestEffort. If you set the qos parameter to BestEffort, all instances on a node share the CPU cores of the node. This way, the system schedules instances based on memory and GPU resources and is not limited by the CPU number of the node. All instances share the CPU cores of the node. In this case, the cpu parameter specifies the maximum number of CPU cores allowed for each instance, whereas memory and GPU resources are allocated to the instances based on the values of the memory and GPU parameters.
	resource	No	The ID of the resource group. If the service is deployed in the public resource group, you can ignore this parameter. In this case, the service is billed based on the pay-as-you-go billing method. If the service is deployed in a dedicated resource group, set this parameter to the ID of the resource group. Example: eas-r-6dbzve8ip0xnzt****.
	cuda	No	The Compute Unified Device Architecture (CUDA) version that is required by the service. When the service starts, the CUDA of the specified version is automatically mounted to the `/usr/local/cuda` directory of the instance. Supported CUDA versions: 8.0, 9.0, 10.0, 10.1, 10.2, 11.0, 11.1, and 11.2. Example: `"cuda":"11.2"`.
	enable_grpc	No	Specifies whether to enable the Google Remote Procedure Call (gRPC) connection for the service gateway. Default value: false. Valid values: false: disables the gRPC connection. In this case, HTTP requests are supported by default. true: enables the gRPC connection. Note If you use a custom image to deploy the service and the image uses the gRPC server, you must set this parameter to true.
	enable_webservice	No	Specifies whether to enable the webserver feature. If the feature is enabled, the system deploys the service as an AI-powered web application. Default value: false. Valid values: false: disables the webserver feature. true: enables the webserver feature.
Advanced parameters Important We recommend that you specify these parameters with caution.	rpc.batching	No	Specifies whether to enable batch processing on the server to accelerate GPU-based modeling. Only built-in processor supports this parameter. Default value: false. Valid values: false: disables batch processing on the server. true: enables batch processing on the server.
	rpc.keepalive	No	The maximum processing time for a single request. If the time required for request processing exceeds this value, the server returns the timeout error code 408 and closes the connection. Default value: 5000. Unit: milliseconds. Note If you use a built-in processor, you must also configure the allspark parameter in your code. For more information, see Develop custom processors by using Python.
	rpc.io_threads	No	The number of threads that are used by each instance to process network I/O. Default value: 4.
	rpc.max_batch_size	No	The maximum size of each batch. Default value: 16. This parameter takes effect only if you set the rpc.batching parameter to true. Only built-in processor supports this parameter.
	rpc.max_batch_timeout	No	The maximum timeout period of each batch. Default value: 50. Unit: milliseconds. This parameter takes effect only if you set the rpc.batching parameter to true. Only built-in processor supports this parameter.
	rpc.max_queue_size	No	The size of the request queue. Default value: 64. When the queue is full, the server returns the error code 450 and closes the connection. To prevent the server from being overloaded, the request queue instructs the client to send requests to other instances when the queue is full. If the response time is excessively long, set this parameter to a smaller value to prevent request timeouts.
	rpc.worker_threads	No	The number of threads that are used by each instance to process concurrent requests. Default value: 5. Only built-in processor supports this parameter.
	rpc.rate_limit	No	Specifies whether to enable QPS-based throttling for an instance and limit the maximum number of queries that can be handled by an instance per second. The default value is 0. A value of 0 specifies that QPS-based throttling is disabled. For example, if you set this parameter to 2000, new requests are denied and status code 429 is returned when the queries per second (QPS) exceeds 2,000.
	rolling_strategy.max_surge	No	The maximum number of additional instances that can be created for the service during a rolling update. You can set this parameter to a positive integer, which specifies the number of additional instances. You can also set this parameter to a percentage, such as 2%, which specifies the ratio of the number of additional instances to the original number of service instances. The default value is 2%. The higher the value, the faster the service is updated. For example, if you set the number of service instances to 100 and set this parameter to 20, 20 additional instances are immediately created when you update the service.
	rolling_strategy.max_unavailable	No	The maximum number of service instances that become unavailable during a rolling update. During a rolling update, the system can release existing instances to free up resources for new instances. This prevents update failures caused by insufficient resources. In the dedicated resource group, the default value of this parameter is 1. In the public resource group, the default value of this parameter is 0. For example, if you set this parameter to N, N instances are immediately stopped when a service update starts. Note If idle resources are sufficient, you can set this parameter to 0. If you set this parameter to a large value, service stability may be affected. This occurs because, during the service update, the number of available instances decreases, increasing the workload of each instance. When specifying this parameter, you must consider service stability and the resources you require.
	eas.termination_grace_period	No	The maximum amount of time allowed for a graceful shutdown. Unit: seconds. Default value: 30. EAS services use the rolling update policy. Before an instance is released, it enters the Terminating state and continues to process the requests that it received during the period of time that you specify, while the system switches the traffic to other instances. The instance is released after the instance finishes processing the requests. Therefore, the duration of the graceful shutdown process must be within the value of this parameter. If the time required to process requests is long, you can increase the value of this parameter to ensure that all requests that are in progress can be processed when the system updates the service. Important If you set this parameter to a small value, service stability may be affected. If you set this parameter to a large value, the service update may be prolonged. We recommend that you use the default value unless you have special requirements.
	scheduling.spread.policy	No	The policy that is used to distribute instances during instance scheduling. Valid values: host: The instances are distributed across as many nodes as possible. zone: The instances are distributed across as many zones as possible. default: The instances are scheduled by using the default policy and are not intentionally distributed.
	rpc.enable_sigterm	No	Valid values: false: When a service instance enters the EXIT state, the system does not send the SIGTERM signal. This is the default value. true: When a service instance enters the EXIT state, the system immediately sends the SIGTERM signal to the main process. After the signal is received, the main process in the service performs a custom graceful shutdown in the signal processing function. If the signal is not processed, the main process may exit immediately after receiving the signal, causing the graceful shutdown to fail.

Table 2. features parameters

Parameter	Required	Description
eas.aliyun.com/extra-ephemeral-storage	No	The additional amount of system disk memory. If the free quota does not meet your business requirements, specify this parameter. The value must be a positive integer from 0 to 2000. Unit: GB.

Table 3. networking parameters

Parameter	Required	Description
disable_internet_endpoint	No	Specifies whether to disable service call over the Internet. The default value is false. If you set this parameter to true, you cannot call the service over the Internet.

Example

Sample JSON file:

{{
  "name": "test_eascmd",
  "processor": "tensorflow_cpu_1.12",
  "model_path": "oss://examplebucket/exampledir/",
  "oss_endpoint": "oss-cn-beijing.aliyuncs.com",
  "model_entry": "",
  "model_config": "",
  "processor": "",
  "processor_path": "",
  "processor_entry": "",
  "processor_mainclass": "",
  "processor_type": "",
  "warm_up_data_path": "",
  "runtime": {
    "enable_crash_block": false
  },
  "cloud": {
    "computing": {
      "instance_type": "ecs.gn6i-c24g1.6xlarge"
    }
  },
  "autoscaler": {
    "min": 2,
    "max": 5,
    "strategies": {
      "qps": 10
    }
  },
  "storage": [
    {
      "mount_path": "/data_oss",
      "oss": {
        "endpoint": "oss-cn-shanghai-internal.aliyuncs.com",
        "path": "oss://bucket/path/"
      }
    }
  ],
  "metadata": {
    "resource": "eas-r-9lkbl2jvdm0puv****",
    "instance": 1,
    "workspace_id": 1405**,
    "gpu": 0,
    "cpu": 1,
    "memory": 2000,
    "gpu_memory": 10,
    "gpu_core_percentage": 10,
    "qos": "",
    "cuda": "11.2",
    "enable_grpc": false,
    "enable_webservice": false,
    "rpc": {
      "batching": false,
      "keepalive": 5000,
      "io_threads": 4,
      "max_batch_size": 16,
      "max_batch_timeout": 50,
      "max_queue_size": 64,
      "worker_threads": 5,
      "rate_limit": 0,
      "enable_sigterm": false
    },
    "rolling_strategy": {
      "max_surge": 1,
      "max_unavailable": 1
    },
    "eas.termination_grace_period": 30,
    "scheduling": {
      "spread": {
        "policy": "host"
      }
    }
  },
  "features": {
    "eas.aliyun.com/extra-ephemeral-storage": "100Gi"
  },
  "networking": {
    "disable_internet_endpoint": false
  }
}