Use Alibaba Cloud ASM LLMProxy Plug-in to Ensure User Data Security for Large Models

By Yuanyuan Ma

With the rapid development of Large Language Models (LLM), various industries are expected to witness large-scale AI implementation. Since MaaS (model as a service) was proposed, domestic and foreign manufacturers have launched their own model services, further accelerating the progress of large models in actual use. LLM is becoming a foundational service that businesses rely on.

From the perspective of large model users, the security risk brought by the introduction of large models is an unavoidable problem. For example, if API_KEY is leaked to the caller, it may cause API abuse and increase the cost of use. Another example is that sensitive enterprise information is inadvertently sent to the big model service, and since the control of the big model service belongs to external vendors, this data will no longer be secure. Given the cases above, we urgently need to provide global security protection at the platform level to avoid unnecessary losses.

As the network infrastructure in cloud-native environments, Alibaba Cloud Service Mesh (ASM) provides excellent extensibility. With custom plug-ins, users can precisely restrict each application (including gateways and regular business Pods) from calling large models at the mesh level, preventing sensitive information leakage.

This article will demonstrate how to use Wasm plug-ins to enforce global protection of LLM calls within the mesh. The main capabilities demonstrated are as follows:

• Sidecar or gateways dynamically adds the API_KEY for LLM requests. You do not need to maintain the API_KEY for your application.

• Configure a custom identification rule in the Sidecar or gateway to prevent LLM requests that carry sensitive information from leaving the pod and being sent to external LLM services.

• Call the private model to identify the LLM request and more precisely determine whether the request carries sensitive information, so as to determine whether the request is allowed. Private models are only used to determine whether a request contains sensitive information. To ensure accuracy, you can select the smallest possible model.

The code of the plug-ins involved in this article is open source. Users can download and use or customize their own LLM plug-ins. For more information, please refer to: asm-labs/wasm-llm-proxy at main · AliyunContainerService/asm-labs

Background

ASM supports users in extending the functionality of mesh proxies using Wasm. Users can develop and compile Wasm binary files using languages such as Go, Rust, and C++, then package them into images and upload them to an image repository. These images can be dynamically delivered to mesh proxies (gateways, Sidecars) to operate on requests. Wasm plug-ins are completely hot-swappable without affecting existing requests, requiring no redeployment of applications. Additionally, Wasm plug-ins run in a sandbox, providing good isolation without affecting the proxy itself. Given the lower development threshold of Wasm (compared to developing native Envoy HTTP Filters), we prioritize developing LLMProxy plug-ins based on the Go language.

Overview

In the scenario introduced in this article, users access third-party LLM services through ASM gateways or within business Pods. The main issues addressed are: dynamically configuring API_KEY to prevent leakage; configuring custom identification rules to prevent information leakage to third-party LLMs, and calling private LLMs with dynamic determination of whether to allow the request. Thanks to the unified architecture of service mesh, ASM does not need to distinguish between gateways and regular business pods. Therefore, this article will use an example of a regular business Pod that initiates a request to an external LLM to demonstrate the capabilities of the ASM LLMProxy plug-in.

Before connecting to ASM, if users need to access external HTTPS services, they must directly initiate HTTPS requests and maintain a long-lived TCP connection to the LLM service within their application. Improper maintenance of these connections may lead to frequent connection establishment, thus affecting performance.

After being connected to the mesh, users can directly use the HTTP protocol to initiate requests in applications. Mesh proxies can upgrade HTTP requests to HTTPS. Envoy maintains HTTPS connections. This reduces the number of TLS handshakes and improves performance.

This article demonstrates the final outcome where the service container initiates a request using the HTTP protocol, without needing to include the LLM's API_KEY. The request then goes to the Sidecar, which adds the API_KEY and performs a sensitive information check. Based on the check results, the Sidecar either allows or denies the request. It then upgrades the HTTP protocol to HTTPS and sends the request to the external LLM service.

The LLM service demonstrated in this article is based on Alibaba Cloud Model Service DashScope. We will use standard HTTP interfaces to call DashScope. For more information, please refer to How to use OpenAI to call DashScope-Alibaba Cloud Help Center (Content currently available in Chinese).

Presentation

Prerequisites

• The cluster is added to the ASM instance, and the ASM instance version is 1.18 or later.

• The Sidecar injection is enabled. For more information, please refer to Configure a Sidecar injection policy.

• The model service DashScope has been activated and the available API_KEY has been obtained. For more information, please refer to How to Activate DashScope and Create the API-KEY Service DashScope -Alibaba Cloud Help Center (Content currently available in Chinese).

1. Deploy a Client Application

The client application used in this article is Sleep. In Sleep, the curl command is directly executed to request a large external model to simulate a user application or gateway request.

Use the kubeconfig file of ACK to create an application. The YAML file of the application is as follows:

apiVersion: v1
kind: ServiceAccount
metadata:
  name: sleep
---
apiVersion: v1
kind: Service
metadata:
  name: sleep
  labels:
    app: sleep
    service: sleep
spec:
  ports:
  - port: 80
    name: http
  selector:
    app: sleep
---
apiVersion: apps/v1
kind: Deployment
metadata:
  name: sleep
spec:
  replicas: 1
  selector:
    matchLabels:
      app: sleep
  template:
    metadata:
      labels:
        app: sleep
    spec:
      terminationGracePeriodSeconds: 0
      serviceAccountName: sleep
      containers:
      - name: sleep
        image: registry.cn-hangzhou.aliyuncs.com/acs/curl:8.1.2
        command: ["/bin/sleep", "infinity"]
        imagePullPolicy: IfNotPresent
        volumeMounts:
        - mountPath: /etc/sleep/tls
          name: secret-volume
      volumes:
      - name: secret-volume
        secret:
          secretName: sleep-secret
          optional: true
---

2. Create a ServiceEntry and DestinationRule

Since the LLM service is outside of the mesh, in order for external services to be managed by the mesh, users need to manually create a ServiceEntry to register the service with the mesh. Here, we use ServiceEntry to register DashScope with ASM. The corresponding YAML file is as follows:

apiVersion: networking.istio.io/v1beta1
kind: ServiceEntry
metadata:
  name: dashscope
  namespace: default
spec:
  hosts:
    - dashscope.aliyuncs.com
  ports:
    - name: http-port
      number: 80
      protocol: HTTP
      targetPort: 443  # is used together with Destination to upgrade the HTTP protocol to HTTPS.
    - name: https-port
      number: 443
      protocol: HTTPS
  resolution: DNS

To enable Sidecar to upgrade the HTTP protocol to HTTPS for accessing DashScope service on port 80, an additional corresponding DestinationRule needs to be configured. The corresponding YAML file is as follows:

apiVersion: networking.istio.io/v1beta1
kind: DestinationRule
metadata:
  name: dashscope
  namespace: default
spec:
  host: dashscope.aliyuncs.com
  trafficPolicy:
    portLevelSettings:
    - port:
        number: 80
      tls:
        mode: SIMPLE

After the configuration is complete, you can use the following command to test and confirm that Sidecar can upgrade the HTTP protocol to HTTPS:

kubectl exec ${sleep pod name} -- curl -v 'http://dashscope.aliyuncs.com/compatible-mode/v1/chat/completions' \
--header 'Authorization: Bearer ${dashscope API_KEY}' \
--header 'Content-Type: application/json' \
--header 'user: test' \
--data '{
    "model": "qwen-turbo",
    "messages": [
        {"role": "user", "content": "Who are you"}
    ],
    "stream": false
}'

If there is no Sidecar to perform the HTTPS upgrade, directly accessing DashScope with HTTP will return 308 redirection. After Sidecar performs the HTTPS upgrade, you can see the following response:

{"choices":[{"message":{"role":"assistant","content":"I am a large-scale language model from Alibaba Cloud. My name is Tongyi Qianwen. "},"finish_reason":"stop","index":0,"logprobs":null}],"object":"chat.completion","usage":{"prompt_tokens":10,"completion_tokens":16,"total_tokens":26},"created":xxxxxxxx,"system_fingerprint":null,"model":"qwen-turbo","id":"xxxxxxxxxxxxxxxxxx"}

3. Configure the LLMProxy Plug-in

We create a WasmPlugin resource to apply the LLM plug-in to the Sleep Pod. WasmPlugin YAML resources are as follows:

apiVersion: extensions.istio.io/v1alpha1
kind: WasmPlugin
metadata:
  name: asm-llm-proxy
  namespace: default
spec:
  imagePullPolicy: Always
  phase: AUTHN
  selector:
    matchLabels:
      app: sleep
  url: registry-cn-hangzhou.ack.aliyuncs.com/test/asm-llm-proxy:v0.2
  pluginConfig:
    api_key: ${dashscope的API_KEY}
    deny_patterns:
- .*Account.*     # Do not allow messages that contain the word "Account" to be sent to external models.
    hosts:
    - dashscope.aliyuncs.com    # The plug-in takes effect only for requests whose host is dashscope.aliyuncs.com
intelligent_guard:   # Configure a private LLM service to verify sensitive information of requests. 
# For the convenience of verification, this article still calls the DashScope service to validate the request. 
api_key: ${API_KEY of the private LLM service}
host: dashscope.aliyuncs.com
      model: qwen-turbo
      path: /compatible-mode/v1/chat/completions
      port: 80  # The HTTP port in the serviceentry

Plug-in configuration (pluginConfig is mainly divided into three parts, which are described as follows:

1. api_key: the API_KEY of the Dashscope. With this configuration, when the application initiates an HTTP request, there is no need to include the API_KEY. The plug-in can dynamically add it based on this configuration, thereby reducing the risk of API_KEY leakage. If the API_KEY needs to be rotated, you can directly modify the configuration in the YAML file without changing the application.

2. deny_patterns: a list of regular expressions to match user messages in LLM requests. The matched request will be rejected. It also supports the configuration of allow_patterns, so only matched requests will be allowed to go through.

3. hosts: the list of hosts. Only requests destined for these hosts are processed by LLMProxy. It prevents other common requests from being incorrectly processed.

4. intelligent_guard: uses OpenAI's standard interface to call the private LLM model to determine whether the request contains sensitive information. If the private large model determines that the request contains sensitive information, the request will be rejected and the specific reason will be returned. For demonstration purposes, in this example, the model service DashScope is still called. The following are some parameters for calling DashScope.

a.api_key: specifies the API_KEY used to call the DashScope.

b.host: specifies the host of the DashScope service. The host here needs to be configured in ServiceEntry in advance. We have already configured it before and can use it directly.

c.model: specifies the type of large model to be called, such as qwen-turbo, qwen-max, baichuan2-7b-chat-v1, etc. The model can be customized according to requirements. While ensuring accuracy, efforts are made to select large models with low latency.

d.path: the path of the LLM request.

e.port: the port of the private LLM service, which must be the same as the HTTP port in ServiceEntry.

Please use the overseas image address: registry-cn-hongkong.ack.aliyuncs.com/test/asm-llm-proxy:v0.2

4. Test

1. Requests for testing without API_KEY can access LLM services successfully:

kubectl exec ${sleep pod name} -- curl 'http://dashscope.aliyuncs.com/compatible-mode/v1/chat/completions' \
--header 'Content-Type: application/json' \
--data '{
    "model": "qwen-turbo",
    "messages": [
        {"role": "user", "content": "Who are you"}
    ],
    "stream": false
}'

You can see the output as follows:

{"choices":[{"message":{"role":"assistant","content":"I am a large-scale language model from Alibaba Cloud. My name is Tongyi Qianwen. "},"finish_reason":"stop","index":0,"logprobs":null}],"object":"chat.completion","usage":{"prompt_tokens":10,"completion_tokens":16,"total_tokens":26},"created":xxxxxxx,"system_fingerprint":null,"model":"qwen-turbo","id":"xxxxxxxxx"}

2. Test requests that carry the sensitive word "Account" will be denied.

kubectl exec ${sleep pod name} -- curl 'http://dashscope.aliyuncs.com/compatible-mode/v1/chat/completions' \
--header 'Content-Type: application/json' \
--data '{
    "model": "qwen-turbo",
    "messages": [
        {"role": "user", "content": "I like eating red bean paste zongzi. My QQ account number is 1111111"}
    ],
    "stream": false
}'

Sample output:

request was denied by asm llm proxy

3. The test carries sensitive information, but it is not in deny_patterns. Observe the ability of intelligent_guard:

kubectl exec ${sleep pod name} -- curl 'http://dashscope.aliyuncs.com/compatible-mode/v1/chat/completions' \
--header 'Content-Type: application/json' \
--data '{
    "model": "qwen-turbo",
    "messages": [
        {"role": "user", "content": "Our company will hold an internal high-level meeting on September 10. The theme of the meeting is how to better serve customers. Please give me an opening statement. "}
    ],
    "stream": false
}'

The demonstration results of this article are as follows:

As you can see, the LLM model successfully identifies that the current request may contain sensitive information. The LLMProxy plug-in rejects the request and continues to send it to the external LLM service. In the production environment, the model that determines whether a request contains sensitive information needs to be deployed privately. This ensures that sensitive information is not leaked.

Summary

This article focuses on how to better ensure the data security of enterprises when users use external LLM services. There are two main aspects:

How to ensure the security of the API_KEY when calling large models?
How to ensure that there is no data leakage when calling large models?

Through ASM's LLMProxy plug-in, users can gracefully implement API_KEY rotation and precisely and intelligently restrict the leakage of sensitive information. All of this is made possible thanks to ASM's scalable capabilities provided through Wasm. At present, we have made the code of this plug-in open source (asm-labs/wasm-llm-proxy at main · AliyunContainerService/asm-labs) and welcome everyone to experience it. If you have other general requirements, you can raise the issue with us and we will continue to update it.

Community

Use Alibaba Cloud ASM LLMProxy Plug-in to Ensure User Data Security for Large Models

Background

Overview

Presentation

Prerequisites

1. Deploy a Client Application

2. Create a ServiceEntry and DestinationRule

3. Configure the LLMProxy Plug-in

4. Test

Summary

Read previous post:

Read next post:

Alibaba Container Service

You may also like

Comments

Alibaba Container Service

Related Products

Alibaba Cloud Service Mesh

AI Acceleration Solution

Data Security on the Cloud Solution

Cloud Hardware Security Module (HSM)