Elastic Algorithm Service (EAS) of Platform for AI (PAI) allows you to call a service over a direct connection to a virtual private cloud (VPC) by using an official SDK or by implementing custom call logic. This topic describes the two methods that you can use to access services through VPC direct connections.
VPC direct connection
The following figure shows how a service is called over a VPC direct connection, a public endpoint, and a VPC endpoint.
After you enable VPC direct connection for the resource group where an EAS service is deployed, EAS automatically associates a secondary elastic network interface (ENI) with the security group that you specify. This helps establish a network connection between the VPC and EAS service instances. Then, you can directly access the EAS service from your VPC without passing through gateways, which eliminates the need for Layer 4 and Layer 7 load balancing. The built-in remote procedure call (RPC) technology of EAS supports HTTP protocol stacks. This significantly reduces latency when you access services with high queries per second (QPS), such as image services.
Prerequisites
The VPC direct connection feature is enabled for the dedicated resource group in which you want to deploy the service. For more information, see Configure network connectivity.
Security groups control the inbound and outbound traffic of Elastic Compute Service (ECS) instances and the network communication between ECS instances and EAS service instances. Instances in a basic security group communicate with each other over an internal network by default. To allow ECS instances to access an EAS service, select the security group where the ECS instances are deployed when you configure the VPC direct connection feature for the service. If the ECS instances and EAS service instances are not deployed in the same security group, configure security group rules to establish connections between instances.
Calling methods
EAS provides the following methods to call a service over a VPC direct connection:
EAS encapsulates call logic and provides the official SDKs for Python, Java, and Go. You can use an official SDK to call a service over a VPC direct connection.
We recommend that you use an official SDK to call services, which is the most efficient and stable method. If you want to use other languages or custom call logic, perform the steps in the "Use custom call logic" section of this topic. To implement custom call logic, you must construct service requests based on specific frameworks. For more information, see Construct a request for a TensorFlow service.
Use an official SDK
SDK for Python
To use the official SDK for Python to call a service, perform the following steps:
Run the following code to install the SDK:
pip install -U eas-prediction --user
For information about how to use the EAS SDK for Python, see SDK for Python.
Compile a call program.
In the following example, a program that uses strings as input and output is used. For information about sample programs that use other input and output formats, such as TensorFlow or PyTorch tensors, see SDK for Python.
#!/usr/bin/env python from eas_prediction import PredictClient from eas_prediction import StringRequest from eas_prediction import TFRequest from eas_prediction import ENDPOINT_TYPE_DIRECT if __name__ == '__main__': client = PredictClient('http://pai-eas-vpc.cn-shanghai.aliyuncs.com', 'mnist_saved_model_example') # Replace the value with the service token. Find the service that you want to access on the EAS page and click Invocation Method in the Service Type column to obtain the token. client.set_token('M2FhNjJlZDBmMzBmMzE4NjFiNzZhMmUxY2IxZjkyMDczNzAzYjFi****') client.set_endpoint_type(ENDPOINT_TYPE_DIRECT) # Specifies to access the service over a VPC direct connection. client.init() # request = StringRequest('[{}]') req = TFRequest('predict_images') req.add_feed('images', [1, 784], TFRequest.DT_FLOAT, [1] * 784) for x in range(0, 1000000): resp = client.predict(req) print(resp)
The
client = PredictClient()
function has the following input parameters: endpoint and service_name. The endpoint parameter specifies the endpoint that is used to access the service, and the service_name parameter specifies the service name. The endpoint is associated with the region in the following format:pai-eas-vpc.{RegionId}.aliyuncs.com
. For example, the VPC direct connection endpoint in the China (Shanghai) region ispai-eas-vpc.cn-shanghai.aliyuncs.com
.
SDK for Java
To call a service by using the official SDK for Java, perform the following steps:
Run the following code to add dependencies. To view the latest version of the EAS SDK, go to MVN Repository.
<dependency> <groupId>com.aliyun.openservices.eas</groupId> <artifactId>eas-sdk</artifactId> <version>2.0.13</version> </dependency>
For information about how to use the EAS SDK for Java, see SDK for Java.
Compile a call program.
import com.aliyun.openservices.eas.predict.http.PredictClient; import com.aliyun.openservices.eas.predict.http.HttpConfig; public class TestString { public static void main(String[] args) throws Exception { // To ensure that the client object is shared as expected, create and initialize the client object when you start the service instead of creating a client object for each request. PredictClient client = new PredictClient(new HttpConfig()); // Replace the value with the service token. Find the service that you want to access on the EAS page and click Invocation Method in the Service Type column to obtain the token. client.setToken("YWFlMDYyZDNmNTc3M2I3MzMwYmY0MmYwM2Y2MTYxMTY4NzBkNzdj****"); // Use the setDirectEndpoint method for VPC direct connections and specify the endpoint in the following format: pai-eas-vpc.{region_id}.aliyuncs.com. For example, the VPC direct connection endpoint in the China (Shanghai) region is pai-eas-vpc.cn-shanghai.aliyuncs.com. client.setDirectEndpoint("pai-eas-vpc.cn-shanghai.aliyuncs.com"); // Replace the value with the name of the service. client.setModelName("scorecard_pmml_example"); // Define the input strings. String request = "[{\"money_credit\": 3000000}, {\"money_credit\": 10000}]"; System.out.println(request); // Obtain the output strings from EAS. try { String response = client.predict(request); System.out.println(response); } catch (Exception e) { e.printStackTrace(); } // Close the client. client.shutdown(); return; } }
SDK for Go
You do not need to install the EAS SDK for Go. The SDK is automatically downloaded from GitHub by the package manager of the Go language during code compilation. For information about how to use the EAS SDK for Go, see SDK for Go.
The following sample code provides an example on how to use the SDK for Go to call a service:
package main
import (
"fmt"
"github.com/pai-eas/eas-golang-sdk/eas"
)
func main() {
// Specify the endpoint in the following format: pai-eas-vpc.{region_id}.aliyuncs.com. For example, the VPC direct connection endpoint in the China (Shanghai) region is pai-eas-vpc.cn-shanghai.aliyuncs.com. Replace the values with the region in which the service is deployed and the name of the service.
client := eas.NewPredictClient("pai-eas-vpc.cn-shanghai.aliyuncs.com", "scorecard_pmml_example")
// Replace the value with the service token. Find the service that you want to access on the EAS page and click Invocation Method in the Service Type column to obtain the token.
client.SetToken("YWFlMDYyZDNmNTc3M2I3MzMwYmY0MmYwM2Y2MTYxMTY4NzBkNzdj****")
client.SetEndpointType(eas.EndpointTypeDirect)
client.Init()
req := "[{\"fea1\": 1, \"fea2\": 2}]"
for i := 0; i < 100; i++ {
resp, err := client.StringPredict(req)
if err != nil {
fmt.Printf("failed to predict: %v\n", err.Error())
} else {
fmt.Printf("%v\n", resp)
}
}
}
Use custom call logic
We recommend that you use an official SDK provided by PAI for VPC direct connections. If you want to use other languages or custom call logic, you can use the following method to call a service by sending HTTP requests. EAS provides a service discovery feature to map a service to its backend addresses. The following table describes the URLs that you can use to obtain the backend addresses of services deployed in different regions.
Region | URL |
China (Shanghai) | http://pai-eas-vpc.cn-shanghai.aliyuncs.com/exported/apis/eas.alibaba-inc.k8s.io/v1/upstreams/ |
China (Beijing) | http://pai-eas-vpc.cn-beijing.aliyuncs.com/exported/apis/eas.alibaba-inc.k8s.io/v1/upstreams/ |
China (Hangzhou) | http://pai-eas-vpc.cn-hangzhou.aliyuncs.com/exported/apis/eas.alibaba-inc.k8s.io/v1/upstreams/ |
The following sample code provides an example on how to obtain the backend addresses of the mist_saved_model_example service that is deployed in the China (Hangzhou) region. This service has two instances.
$curl http://pai-eas-vpc.cn-hangzhou.aliyuncs.com/exported/apis/eas.alibaba-inc.k8s.io/v1/upstreams/mnist_saved_model_example
The following result is returned:
{
"correlative": [
"mnist_saved_model_example"
],
"endpoints": {
"items": [
{
"app": "mnist-saved-model-example",
"ip": "172.16.XX.XX",
"port": 50000,
"weight": 100
},
{
"app": "mnist-saved-model-example",
"ip": "172.16.XX.XX",
"port": 50000,
"weight": 100
}
]
}
}
The returned result contains the IP addresses, port numbers, and weights of the two instances. You can use the IP addresses to access the service from the VPC that you configured during model deployment.
To implement service discovery and establish a direct connection, perform the following steps:
Create a background thread on the client to implement service discovery at a regular interval. The most recent instance list is returned and stored in the local cache. We recommend that you set the interval to 5 seconds.
When you send an inference request, obtain the IP address and port number of an instance from the local cache. You can use the weighted round robin algorithm to obtain an IP address and port number or specify one in your code based on your business requirements.
If the connection fails, an exception may occur on the service instance. For example, the instance did not respond. If the service has more than one instance, the client must obtain the IP address and port number of another instance and attempt to re-establish the network connection.
To prevent service access failures, take note of the following rules:
Service discovery is a bypass service. You cannot call service discovery each time you send an inference request.
Update the cached data only when the returned status code is 200 and the result is not empty.
When you call a service over the VPC direct connection, the load balancing and retry logic are implemented on the client. The platform does not provide a fault tolerance mechanism. You must perform the preceding steps to ensure service continuity. Otherwise, the platform cannot guarantee the performance described in the service level agreement (SLA).
For more information, see SDK for Python.
References
For information about other methods to call services, see Overview.