When multiple replicas of a service run behind a load balancer, per-instance rate limits cannot enforce a total request cap across the service. Global rate limiting solves this by using a centralized service to coordinate request counts across all Envoy sidecars in the mesh.
Service Mesh (ASM) V1.18.0.131 and later provides the ASMGlobalRateLimiter custom resource (CR) to configure global rate limiting for inbound traffic on services with injected sidecar proxies. This CR provides a stable, declarative API that abstracts away low-level Envoy filter configuration.
This topic covers two scenarios:
Port-level rate limiting: Limit all requests to a specific service port.
Path-level rate limiting: Limit requests to a specific URL path on a service port.
For per-instance rate limits, see Configure local rate limiting in Traffic Management Center.
Global vs. local rate limiting
Approach | How it works | Best for |
Global rate limiting | A centralized gRPC service (backed by Redis) tracks request counts across all instances. | Enforcing a hard cap across the entire service, regardless of replica count. |
Local rate limiting | Each Envoy sidecar enforces limits independently. | Protecting individual instances from overload. No external dependencies. |
How it works
Global rate limiting relies on three components:
Redis -- Stores request counters shared across all Envoy sidecars.
Rate limit service -- A gRPC service that Envoy sidecars query before forwarding requests. It checks counters in Redis and returns allow or deny decisions.
ASMGlobalRateLimiter CR -- Declarative configuration applied to the ASM control plane. ASM translates it into Envoy filter configurations and generates the rate limit service config in the CR's
statusfield.
Client request --> Envoy sidecar --> Rate limit service (gRPC) --> Redis
|
Allow or deny (HTTP 429)Prerequisites
An ACK managed cluster added to your ASM instance (V1.18.0.131 or later). For more information, see Add a cluster to an ASM instance.
Automatic sidecar proxy injection enabled for the
defaultnamespace. For more information, see the "Enable automatic sidecar proxy injection" section of Manage global namespaces.An ASM ingress gateway named
ingressgatewaywith port 80 enabled. For more information, see Create an ingress gateway.The sample applications
sleepandhttpbindeployed. For more information, see Deploy the HTTPBin application and Deploy the sleep service.
Deploy the rate limit service
Deploy Redis and the rate limit service to your ACK cluster before configuring rate limiting rules.
Create a file named
ratelimit-svc.yamlwith the following content:Run the following command in the ACK cluster to deploy the services:
kubectl apply -f ratelimit-svc.yaml
Scenario 1: Rate limit all requests on a service port
This scenario limits all requests to port 8000 of the HTTPBin service to 1 request per minute.
The workflow has three steps: create the ASMGlobalRateLimiter CR on the ASM instance, sync the generated config to the rate limit service in the ACK cluster, then verify.
Step 1: Create the ASMGlobalRateLimiter CR
Create a file named
global-ratelimit-svc.yaml:The following table explains the key fields. For a full field reference, see ASMGlobalRateLimiter field descriptions.
Field
Description
workloadSelectorSelects the target workload. Set
app: httpbinto apply rate limiting to the HTTPBin service.isGatewaySet to
falsebecause the rule targets a sidecar workload, not an ingress gateway.rateLimitServiceConnection settings for the rate limit service: hostname, gRPC port (8081), and a 5-second timeout.
limitRate limiting threshold.
unit: MINUTEandquota: 1means 1 request per minute on the matched route.vhostMatches the virtual host.
name: '*'withport: 8000applies the rule to all requests on HTTPBin port 8000.Run the following command in the ASM instance to apply the CR:
kubectl apply -f global-ratelimit-svc.yaml
Step 2: Sync the rate limit config to the data plane
After ASM processes the CR, it generates rate limit service configuration in the status.config.yaml field. You must copy this config into the rate limit service's ConfigMap in the ACK cluster. This manual sync is required because the ASM control plane and ACK data plane run in separate clusters.
Retrieve the generated config from the ASM instance:
kubectl get asmglobalratelimiter global-svc-test -o yamlIn the output, locate the
statussection:Create a
ratelimit-config.yamlfile. Copy theconfig.yamlcontent from thestatussection into thedata.config.yamlfield of the ConfigMap exactly as shown:ImportantCopy the
config.yamlvalue from thestatussection without modification. Any changes cause the rate limit service to reject the configuration.Apply the ConfigMap to the ACK cluster:
kubectl apply -f ratelimit-config.yaml
Verify
Send two requests to HTTPBin port 8000 from the sleep pod:
kubectl exec -it deploy/sleep -- shThen run:
curl httpbin:8000/get -v
curl httpbin:8000/get -vExpected output for the second request:
< HTTP/1.1 429
< x-envoy-ratelimited: true
< x-ratelimit-limit: 1, 1;w=60
< x-ratelimit-remaining: 0
< x-ratelimit-reset: 5
< date: Thu, 26 Oct 2023 04:23:54 GMT
< server: envoy
< content-length: 0
< x-envoy-upstream-service-time: 2
<
* Connection #0 to host httpbin left intactThe second request returns 429, confirming that global rate limiting is active. Only one request is allowed to access the HTTPBin service within one minute. When you send a second request, throttling is triggered, which indicates that global rate limiting takes effect on inbound traffic of the service into which a sidecar proxy is injected.
Scenario 2: Rate limit requests to a specific path
This scenario limits requests to the /headers path on HTTPBin port 8000 to 1 request per minute, while allowing unlimited access to other paths like /get.
The configuration differs depending on your ASM version.
Step 1: Create the ASMGlobalRateLimiter CR
Choose the YAML that matches your ASM version:
The following table explains the additional fields used in this scenario:
Field | Description |
| The throttling parameters to take effect. |
| (V1.19.0+) Overrides the base rate limiting threshold for requests matching specific criteria. Each override specifies its own |
| The domain name and route on which throttling takes effect. For versions earlier than V1.19.0, you can configure header matching rules for requests in the |
| (Pre-V1.19.0) Matches requests by HTTP header values within the |
Step 2: Sync the rate limit config to the data plane
Follow the same process as Scenario 1:
Retrieve the generated config from the ASM instance:
kubectl get asmglobalratelimiter global-svc-test -o yamlCopy the
config.yamlfrom thestatussection into a ConfigMap and apply it to the ACK cluster:kubectl apply -f ratelimit-config.yaml
Verify
Send two requests to the
/headerspath from the sleep pod:kubectl exec -it deploy/sleep -- shThen run:
curl httpbin:8000/headers -v curl httpbin:8000/headers -vExpected output for the second request:
< HTTP/1.1 429 Too Many Requests < x-envoy-ratelimited: true < x-ratelimit-limit: 1, 1;w=60 < x-ratelimit-remaining: 0 < x-ratelimit-reset: 5 < date: Thu, 26 Oct 2023 04:23:54 GMT < server: envoy < content-length: 0 < x-envoy-upstream-service-time: 2 < * Connection #0 to host httpbin left intactThe second request to
/headersis rate limited. Only one request is allowed to access the/headerspath of the HTTPBin service within one minute.Confirm that other paths are unaffected:
curl httpbin:8000/get -vRequests to
/getsucceed because only the/headerspath is subject to rate limiting.
Monitor global rate limiting metrics
Envoy sidecars expose metrics for global rate limiting:
Metric | Type | Description |
| Counter | The total number of requests allowed by global throttling. |
| Counter | The total number of requests that are determined to trigger throttling by global throttling. |
| Counter | The total number of requests that fail to call global throttling. |
Enable metric reporting
Configure
proxyStatsMatcherfor the sidecar proxy. Select Regular Expression Match and set the value to.*ratelimit.*. For more information, see the "proxyStatsMatcher" section of Configure sidecar proxies.Redeploy the HTTPBin service to pick up the new proxy configuration. For more information, see the "(Optional) Redeploy workloads" section of Configure sidecar proxies.
Configure global throttling and perform request tests. For more information, see Scenario 1 or Configure local rate limiting in Traffic Management Center.
Run the following command to view the global throttling metrics of the HTTPBin service:
kubectl exec -it deploy/httpbin -c istio-proxy -- curl localhost:15090/stats/prometheus|grep envoy_cluster_ratelimitExample output:
# TYPE envoy_cluster_ratelimit_ok counter envoy_cluster_ratelimit_ok{cluster_name="inbound|80||"} 904 # TYPE envoy_cluster_ratelimit_over_limit counter envoy_cluster_ratelimit_over_limit{cluster_name="inbound|80||"} 3223
Set up Prometheus alerts
Use Managed Service for Prometheus to collect rate limiting metrics and trigger alerts when rate limiting occurs.
Connect the ACK cluster to the Alibaba Cloud ASM component in Managed Service for Prometheus, or upgrade the component to the latest version. For more information, see Manage components.
NoteIf you already use a self-managed Prometheus instance to collect ASM metrics, skip this step. For more information, see Monitor ASM instances by using a self-managed Prometheus instance.
Create a custom alert rule with the following PromQL and message template. For detailed instructions, see Create an alert rule with a custom PromQL statement. This rule fires when any service has at least one rate-limited request in a rolling one-minute window, grouped by namespace and service name.
Parameter
Example value
PromQL statement
(sum by(namespace, service_istio_io_canonical_name) (increase(envoy_cluster_ratelimit_over_limit[1m]))) > 0Alert message
Global rate limiting triggered. Namespace: {{$labels.namespace}}. Service: {{$labels.service_istio_io_canonical_name}}. Throttled requests in the last minute: {{ $value }}
What's next
Ingress gateway rate limiting: Apply global rate limiting at the mesh edge instead of at the sidecar level. See Configure global rate limiting on an ingress gateway.
Local rate limiting: Combine with per-instance limits for defense in depth. See Configure local rate limiting in Traffic Management Center.
Field reference: Review all available fields for fine-grained control. See ASMGlobalRateLimiter field descriptions.