This topic describes AI capabilities.
In AI scenarios, gateway traffic has three key features that distinguish it from other service traffic:
Persistent connections: AI scenarios often use WebSocket and Server-Sent Events (SSE) protocols, resulting in a high proportion of persistent connections. Gateway configuration updates must not affect these connections or disrupt services.
High latency: The response latency for Large Language Model (LLM) inference is much higher than for standard applications. This makes AI applications vulnerable to malicious attacks. Attackers can easily launch concurrent attacks with slow requests. These attacks are low-cost for the attacker but create high overhead for the server.
High bandwidth: AI scenarios consume far more bandwidth than standard applications due to the large volume of data transmitted for LLM context over high-latency connections. If a gateway lacks efficient streaming capabilities and a memory recovery mechanism, memory usage can increase rapidly.
MSE cloud-native gateways have inherent advantages in handling AI gateway traffic. These advantages include the following:
Lossless hot updates for persistent connections: Unlike Nginx, which requires a reload for configuration changes that can cause disconnections, MSE cloud-native gateways use Envoy to perform seamless hot updates without dropping connections.
Security gateway capabilities: The security gateway feature of MSE cloud-native gateways provides multi-dimensional CC attack mitigation based on IP addresses, cookies, and other factors. For AI scenarios, it supports throttling protection based on token throughput in addition to queries per second (QPS).
Efficient streaming: MSE cloud-native gateways support full stream forwarding. The data plane is built on Envoy, which is written in C++. This design results in very low memory usage in high-bandwidth scenarios. Although memory is inexpensive compared to GPUs, poor memory management can lead to out-of-memory (OOM) errors, causing service breakdowns and significant losses.
Additionally, a comprehensive set of out-of-the-box AI plugins is available. These plugins cover areas such as security protection, multi-model adaptation, observability, caching, and prompt engineering. The core capabilities are as follows:
AI proxy plugin: Supports various protocols and is compatible with 15 LLM providers, covering most major large model vendors.
AI content moderation plugin: Integrates with Alibaba Cloud Content Moderation to block harmful language, misinformation, discriminatory speech, and illegal or non-compliant content.
AI statistics plugin: Calculates token throughput, generates Prometheus metrics in real time, and records related information in access logs and Tracing Analysis spans.
AI throttling plugin: Supports backend protection through throttling based on token throughput. It also lets you configure precise call quota limits for tenants.
AI developer plugin set: Provides capabilities such as LLM result caching and prompt decoration to facilitate the development of AI applications.