All Products
Search
Document Center

Platform For AI:Call services over the Internet or a private network through a gateway

Last Updated:Mar 04, 2026

Elastic Algorithm Service (EAS) provides shared and dedicated gateways for calling deployed model inference services. You can access these services over the Internet or a private network. The process is similar for both methods. Choose the gateway type and access method that best fits your needs.

Choose a gateway type

EAS offers shared and dedicated gateways. The differences are outlined below:

Comparison

Shared Gateway

Dedicated Gateway

Public Network Invocation

Supported by default

Supported, but must be enabled first

Private network access

Supported by default

Supported, but must be enabled first

Cost

Free

Requires additional payment

Bandwidth

Shared

Dedicated

Scenarios

Services in staging environments with low traffic that do not require custom access policies

Services with high traffic that require high security, stability, and performance

Configuration method

Default configuration. Ready to use.

Must be created first and then selected during deployment. For more information, see Use a dedicated gateway.

Recommendations:

  • Use a Shared Gateway for development and testing environments.

  • Use a dedicated gateway for production environments.

Choose an access method

Internet endpoint

Use this method if your environment has Internet access. Requests are forwarded to your deployed service through the EAS Shared Gateway.

Scenarios:

  • Calling services from outside Alibaba Cloud

  • Local development and testing

  • Integration with external applications

VPC address

Use this method when your application and the EAS service are deployed in the same region. VPC networks in the same region can establish a VPC connection for secure communication.

Scenarios:

  • The application runs on Alibaba Cloud in the same region as the EAS service.

  • Lower latency and cost are required.

  • The service should not be exposed to the Internet.

Important

Compared to calling over the Internet, calling within a VPC is faster because it avoids the network performance overhead of Internet access. It is also cheaper because private network traffic is usually free.

How to call a service

Calling an EAS service requires three key elements:

  • Service endpoint

  • Authorization token

  • A request structured according to the model's API specification

Step 1: Get the endpoint and token

After you deploy a service, the system automatically generates an endpoint and an authorization token.

Important

The console provides the base endpoint. You usually need to append the correct API path to form the complete request URL. An incorrect path is the most common cause of a 404 Not Found error.

  1. On the Inference Service tab, click the name of the target service to go to the Overview page.

  2. In the Basic Information section, click View Endpoint Information.

  3. In the Invocation Method panel, copy the endpoint and token:

    • Choose the Internet endpoint or VPC endpoint as needed.

    • The following examples use <EAS_ENDPOINT> for the endpoint and <EAS_TOKEN> for the token.

    image

Step 2: Construct and send the request

The request format is the same whether you use an Internet endpoint or a VPC endpoint. A standard request typically includes these four core elements:

  • Method: The most common methods are POST and GET.

  • URL:

    • Format: <EAS_ENDPOINT> + API path

    • Example: http://16********.cn-hangzhou.pai-eas.aliyuncs.com/api/predict/test + /v1/chat/completion

  • Headers:

    • Authorization: <EAS_TOKEN> (Required for authorization)

    • Content-Type: application/json (Usually required for POST requests)

  • Body: The format, such as JSON, depends on the deployed model's API specification.

    Important

    When calling through a gateway, the request body size cannot exceed 1 MB.

Invocation example

To call the DeepSeek-R1-Distill-Qwen-7B model service deployed with vLLM, you need the following elements:

  • Method: POST

  • Request path: <EAS_ENDPOINT>/v1/chat/completions (chat API)

  • Headers:

    • Authorization: <Token>

    • Content-Type: application/json

  • Request body:

    {
        "model": "DeepSeek-R1-Distill-Qwen-7B",
        "messages": [
        {
            "role": "system",
            "content": "You are a helpful assistant."
        },
        {
            "role": "user",
            "content": "hello!"
        }
        ]
    }

Code example:

Assume <EAS_ENDPOINT> is http://16********.cn-hangzhou.pai-eas.aliyuncs.com/api/predict/test.

curl http://16********.cn-hangzhou.pai-eas.aliyuncs.com/api/predict/test/v1/chat/completions \
-H "Content-Type: application/json" \
-H "Authorization: *********5ZTM1ZDczg5OT**********" \
-X POST \
-d '{
    "model": "DeepSeek-R1-Distill-Qwen-7B",
    "messages": [
    {
        "role": "system",
        "content": "You are a helpful assistant."
    },
    {
        "role": "user",
        "content": "hello!"
    }
    ]
}' 
import requests

# Replace with your actual endpoint.
url = 'http://16********.cn-hangzhou.pai-eas.aliyuncs.com/api/predict/test/v1/chat/completions'
# The value of Authorization in the header is your actual token.
headers = {
    "Content-Type": "application/json",
    "Authorization": "*********5ZTM1ZDczg5OT**********",
}
# Construct the service request based on the data format required by the specific model.
data = {
    "model": "DeepSeek-R1-Distill-Qwen-7B",
    "messages": [
        {
            "role": "system",
            "content": "You are a helpful assistant."
        },
        {
            "role": "user",
            "content": "hello!"
        }
    ]
}
# Send the request.
resp = requests.post(url, json=data, headers=headers)
print(resp)
print(resp.content)

For more information about calling Large Language Model (LLM) services, see LLM service invocation.

More scenarios

  • Models deployed from the Model Gallery: The Overview page for these models usually provides API call examples, including the full URL path and request format.

    cURL command

    Basic syntax: curl [options] [URL]

    Common parameters (options):

    • -X: Specifies the HTTP method, such as -X POST.

    • -H: Adds a request header, such as -H "Content-Type: application/json".

    • -d: Adds a request body, such as -d '{"key": "value"}'.

    image

    Python code

    The following Python code uses the Qwen3-Reranker-8B model as an example. Note that its URL and request body are different from the cURL command example. Be sure to refer to the corresponding model introduction.

    image

  • Scenario-based deployments:

  • Services deployed using a generic processor, including TensorFlow, Caffe, and PMML: For more information, see Construct a service request based on a generic processor.

  • Other custom services: The request format is determined by the data input format you define in your custom image or code.

  • Models you trained yourself: The calling method is the same as for the original model.

FAQ

For common questions and solutions related to service invocation, see Service Invocation FAQ.