All Products
Search
Document Center

Platform For AI:Call a service over a VPC direct connection

Last Updated:Mar 04, 2026

High-traffic inference services, such as image recognition or financial risk control, require low latency and high throughput. VPC direct connection lets clients in a virtual private cloud (VPC) connect directly to Elastic Algorithm Service (EAS) service instances, bypassing Layer 4 Server Load Balancer (SLB) and Layer 7 network forwarding. This reduces latency and increases throughput compared to routing through public gateways.

How it works

After you enable VPC direct connection for a service, EAS creates a free auxiliary Elastic Network Interface (ENI) for each service instance and attaches it to the specified VPC and vSwitch. This establishes a direct network path between your VPC and the EAS service instances, bypassing the public gateway.

EAS also provides a service discovery mechanism that returns a real-time list of IP:PORT pairs for all service instances. Clients use this list to implement client-side load balancing and failover.

VPC direct connection architecture

Prerequisites

Before you begin, make sure that you have:

  • An EAS service deployed with VPC direct connection enabled. For setup instructions, see Network configuration

  • Enough available IP addresses in the vSwitch. Each ENI occupies one IP address, so the available count must be greater than or equal to the number of service instances

  • Security group rules configured to allow traffic between your client and EAS service instances

Important

Network access between clients, such as Elastic Compute Service (ECS) instances, and EAS service instances is controlled by security group rules.

  • By default, instances in a basic security group can communicate over the internal network. When you configure VPC direct connection for an EAS service, select the security group where the ECS instances that need to access the service are located.

  • To use different security groups, set security group rules to allow communication between the instances. For more information, see Allow access between instances in different security groups in a classic network.

VPC direct connection endpoint

The VPC direct connection endpoint follows this format:

{Uid}.vpc.{RegionId}.pai-eas.aliyuncs.com
Placeholder Description Example
{Uid} Your Alibaba Cloud account ID 123**********
{RegionId} The region where the EAS service is deployed cn-shanghai

Example endpoint: 123**********.vpc.cn-shanghai.pai-eas.aliyuncs.com

Call a service using an SDK (recommended)

The official EAS SDKs handle service discovery, load balancing, and failover retries automatically. Use an SDK for the most reliable VPC direct connection experience.

Python SDK

  1. Install or upgrade the SDK.

       pip install -U eas-prediction --user
  2. Call the service. The following example uses a TensorFlow request as input. For other input formats, see Using the Python SDK.

    The PredictClient constructor takes the VPC direct connection endpoint and the service name as arguments. Call set_endpoint_type(ENDPOINT_TYPE_DIRECT) to enable VPC direct connection, then call init() to initialize the client.
       #!/usr/bin/env python
       from eas_prediction import PredictClient
       from eas_prediction import StringRequest
       from eas_prediction import TFRequest
       from eas_prediction import ENDPOINT_TYPE_DIRECT
    
       # VPC direct connection endpoint: {Uid}.vpc.{RegionId}.pai-eas.aliyuncs.com
       # Replace with your Alibaba Cloud account ID and region.
       ENDPOINT = "123**********.vpc.cn-shanghai.pai-eas.aliyuncs.com"
    
       # Replace with your EAS service name.
       SERVICE_NAME = "mnist_saved_model_example"
    
       # Replace with your service token. Obtain the token from the service details page.
       # Store tokens in environment variables or Key Management Service (KMS) rather than hardcoding them.
       TOKEN = "M2FhNjJlZDBmMzBmMzE4NjFiNzZhMmUxY2IxZjkyMDczNzAzYjFi****"
    
       if __name__ == '__main__':
           client = PredictClient(ENDPOINT, SERVICE_NAME)
           client.set_token(TOKEN)
           client.set_endpoint_type(ENDPOINT_TYPE_DIRECT)  # Enable VPC direct connection
           client.init()
    
           req = TFRequest('predict_images')
           req.add_feed('images', [1, 784], TFRequest.DT_FLOAT, [1] * 784)
           resp = client.predict(req)
           print(resp)

Java SDK

  1. Add the Maven dependency. Add the following dependency to your pom.xml file. For the latest version, see the Maven repository. For more information, see Using the Java SDK.

       <dependency>
         <groupId>com.aliyun.openservices.eas</groupId>
         <artifactId>eas-sdk</artifactId>
         <version>2.0.20</version>
       </dependency>
  2. Call the service.

       import com.aliyun.openservices.eas.predict.http.PredictClient;
       import com.aliyun.openservices.eas.predict.http.HttpConfig;
    
       public class TestString {
           public static void main(String[] args) throws Exception {
               // Create and initialize the client once at startup.
               // Do not create a new client for each request.
               PredictClient client = new PredictClient(new HttpConfig());
    
               // Replace with your service token from the service details page.
               client.setToken("YWFlMDYyZDNmNTc3M2I3MzMwYmY0MmYwM2Y2MTYxMTY4NzBkNzdj****");
    
               // Set the VPC direct connection endpoint: {Uid}.vpc.{RegionId}.pai-eas.aliyuncs.com
               // Replace with your Alibaba Cloud account ID and region.
               client.setDirectEndpoint("123**********.vpc.cn-shanghai.pai-eas.aliyuncs.com");
    
               // Replace with your EAS service name.
               client.setModelName("scorecard_pmml_example");
    
               // Define the input string.
               String request = "[{\"money_credit\": 3000000}, {\"money_credit\": 10000}]";
               System.out.println(request);
    
               // Send the prediction request.
               try {
                   String response = client.predict(request);
                   System.out.println(response);
               } catch (Exception e) {
                   e.printStackTrace();
               }
    
               // Shut down the client when finished.
               client.shutdown();
           }
       }

Go SDK

The Go package manager downloads the SDK automatically during compilation. No separate installation is required. For more information, see Golang SDK Guide.

package main

import (
    "fmt"
    "github.com/pai-eas/eas-golang-sdk/eas"
)

func main() {
    // VPC direct connection endpoint: {Uid}.vpc.{RegionId}.pai-eas.aliyuncs.com
    // Replace with your Alibaba Cloud account ID, region, and service name.
    client := eas.NewPredictClient("123**********.vpc.cn-shanghai.pai-eas.aliyuncs.com", "scorecard_pmml_example")

    // Replace with your service token from the service details page.
    client.SetToken("YWFlMDYyZDNmNTc3M2I3MzMwYmY0MmYwM2Y2MTYxMTY4NzBkNzdj****")
    client.SetEndpointType(eas.EndpointTypeDirect)
    client.Init()

    req := "[{\"fea1\": 1, \"fea2\": 2}]"
    for i := 0; i < 100; i++ {
        resp, err := client.StringPredict(req)
        if err != nil {
            fmt.Printf("failed to predict: %v\n", err.Error())
        } else {
            fmt.Printf("%v\n", resp)
        }
    }
}

Build a custom client

If the official SDKs do not meet your requirements, implement the HTTP invocation logic yourself.

Warning

A custom client must handle service discovery, load balancing, and failover retries. Improper implementation directly affects service availability. The platform Service Level Agreement (SLA) does not cover service interruptions caused by custom client implementations. Use an official SDK whenever possible.

Service discovery API

EAS provides an HTTP API for service discovery within the configured VPC. This API returns the IP addresses, ports, and weights of all backend instances for a service.

Property Details
URL http://{Uid}.vpc.{RegionId}.pai-eas.aliyuncs.com/exported/apis/eas.alibaba-inc.k8s.io/v1/upstreams/{ServiceName}
Authentication None required. Accessible only from within the configured VPC.
Polling interval Call every 5--10 seconds from a background thread.
Important

The service discovery API is a background service. Do not call it for every inference request. Doing so severely degrades performance.

Example request:

The following example queries a service named mnist_saved_model_example deployed in China (Hangzhou). Replace 123********** with your Alibaba Cloud account ID.

curl http://123**********.vpc.cn-hangzhou.pai-eas.aliyuncs.com/exported/apis/eas.alibaba-inc.k8s.io/v1/upstreams/mnist_saved_model_example

Example response:

{
  "correlative": [
    "mnist_saved_model_example"
  ],
  "endpoints": {
    "items": [
      {
        "app": "mnist-saved-model-example",
        "ip": "172.16.XX.XX",
        "port": 50000,
        "weight": 100
      },
      {
        "app": "mnist-saved-model-example",
        "ip": "172.16.XX.XX",
        "port": 50000,
        "weight": 100
      }
    ]
  }
}

Implementation requirements

A reliable custom client must include three core components:

1. Cache the instance list locally and refresh periodically

Start a background thread that polls the service discovery API every 5--10 seconds.

  • On success (HTTP 200 with a non-empty instance list): Overwrite the local cache with the new list.

  • On failure (timeout, non-200 status, or empty list): Continue using the local cache. Do not clear the cache. This preserves service availability during transient failures.

2. Load-balance requests across instances

Each time you send an inference request, select a target instance from the local cache. Use an algorithm such as weighted round-robin, or select instances based on your own business logic.

3. Retry failed requests on a different instance

If a connection to an instance fails (for example, due to an instance crash), retry the request. If the local cache contains more than one instance, select a different instance for the retry.

For a complete reference implementation, see the Python SDK source code.

References