Elastic Algorithm Service (EAS) provides shared and dedicated gateways for calling deployed model inference services. You can access these services over the Internet or a private network. The process is similar for both methods. Choose the gateway type and access method that best fits your needs.
Choose a gateway type
EAS offers shared and dedicated gateways. The differences are outlined below:
Comparison | Shared Gateway | Dedicated Gateway |
Public Network Invocation | Supported by default | Supported, but must be enabled first |
Private network access | Supported by default | Supported, but must be enabled first |
Cost | Free | Requires additional payment |
Bandwidth | Shared | Dedicated |
Scenarios | Services in staging environments with low traffic that do not require custom access policies | Services with high traffic that require high security, stability, and performance |
Configuration method | Default configuration. Ready to use. | Must be created first and then selected during deployment. For more information, see Use a dedicated gateway. |
Recommendations:
Use a Shared Gateway for development and testing environments.
Use a dedicated gateway for production environments.
Choose an access method
Internet endpoint
Use this method if your environment has Internet access. Requests are forwarded to your deployed service through the EAS Shared Gateway.
Scenarios:
Calling services from outside Alibaba Cloud
Local development and testing
Integration with external applications
VPC address
Use this method when your application and the EAS service are deployed in the same region. VPC networks in the same region can establish a VPC connection for secure communication.
Scenarios:
The application runs on Alibaba Cloud in the same region as the EAS service.
Lower latency and cost are required.
The service should not be exposed to the Internet.
Compared to calling over the Internet, calling within a VPC is faster because it avoids the network performance overhead of Internet access. It is also cheaper because private network traffic is usually free.
How to call a service
Calling an EAS service requires three key elements:
Service endpoint
Authorization token
A request structured according to the model's API specification
Step 1: Get the endpoint and token
After you deploy a service, the system automatically generates an endpoint and an authorization token.
The console provides the base endpoint. You usually need to append the correct API path to form the complete request URL. An incorrect path is the most common cause of a 404 Not Found error.
On the Inference Service tab, click the name of the target service to go to the Overview page.
In the Basic Information section, click View Endpoint Information.
In the Invocation Method panel, copy the endpoint and token:
Choose the Internet endpoint or VPC endpoint as needed.
The following examples use <EAS_ENDPOINT> for the endpoint and <EAS_TOKEN> for the token.

Step 2: Construct and send the request
The request format is the same whether you use an Internet endpoint or a VPC endpoint. A standard request typically includes these four core elements:
Method: The most common methods are POST and GET.
URL:
Format: <EAS_ENDPOINT> + API path
Example:
http://16********.cn-hangzhou.pai-eas.aliyuncs.com/api/predict/test+/v1/chat/completion
Headers:
Authorization: <EAS_TOKEN>(Required for authorization)Content-Type: application/json(Usually required for POST requests)
Body: The format, such as JSON, depends on the deployed model's API specification.
ImportantWhen calling through a gateway, the request body size cannot exceed 1 MB.
Invocation example
To call the DeepSeek-R1-Distill-Qwen-7B model service deployed with vLLM, you need the following elements:
Method: POST
Request path: <EAS_ENDPOINT>/v1/chat/completions (chat API)
Headers:
Authorization: <Token>
Content-Type: application/json
Request body:
{ "model": "DeepSeek-R1-Distill-Qwen-7B", "messages": [ { "role": "system", "content": "You are a helpful assistant." }, { "role": "user", "content": "hello!" } ] }
Code example:
Assume <EAS_ENDPOINT> is http://16********.cn-hangzhou.pai-eas.aliyuncs.com/api/predict/test.
curl http://16********.cn-hangzhou.pai-eas.aliyuncs.com/api/predict/test/v1/chat/completions \
-H "Content-Type: application/json" \
-H "Authorization: *********5ZTM1ZDczg5OT**********" \
-X POST \
-d '{
"model": "DeepSeek-R1-Distill-Qwen-7B",
"messages": [
{
"role": "system",
"content": "You are a helpful assistant."
},
{
"role": "user",
"content": "hello!"
}
]
}' import requests
# Replace with your actual endpoint.
url = 'http://16********.cn-hangzhou.pai-eas.aliyuncs.com/api/predict/test/v1/chat/completions'
# The value of Authorization in the header is your actual token.
headers = {
"Content-Type": "application/json",
"Authorization": "*********5ZTM1ZDczg5OT**********",
}
# Construct the service request based on the data format required by the specific model.
data = {
"model": "DeepSeek-R1-Distill-Qwen-7B",
"messages": [
{
"role": "system",
"content": "You are a helpful assistant."
},
{
"role": "user",
"content": "hello!"
}
]
}
# Send the request.
resp = requests.post(url, json=data, headers=headers)
print(resp)
print(resp.content)For more information about calling Large Language Model (LLM) services, see LLM service invocation.
More scenarios
Models deployed from the Model Gallery: The Overview page for these models usually provides API call examples, including the full URL path and request format.
cURL command
Basic syntax:
curl [options] [URL]Common parameters (options):
-X: Specifies the HTTP method, such as-X POST.-H: Adds a request header, such as-H "Content-Type: application/json".-d: Adds a request body, such as-d '{"key": "value"}'.

Python code
The following Python code uses the Qwen3-Reranker-8B model as an example. Note that its URL and request body are different from the cURL command example. Be sure to refer to the corresponding model introduction.

Scenario-based deployments:
Services deployed using a generic processor, including TensorFlow, Caffe, and PMML: For more information, see Construct a service request based on a generic processor.
Other custom services: The request format is determined by the data input format you define in your custom image or code.
Models you trained yourself: The calling method is the same as for the original model.
FAQ
For common questions and solutions related to service invocation, see Service Invocation FAQ.