All Products
Search
Document Center

Alibaba Cloud Service Mesh:Configure circuit breaking for inter-service traffic with ASMCircuitBreaker

Last Updated:Mar 10, 2026

When one service in a microservices call chain starts failing or responding slowly, the failure can cascade through dependent services and bring down the entire system. Circuit breaking stops this by monitoring request outcomes on each route and automatically rejecting requests to a failing service, giving it time to recover.

Service Mesh (ASM) provides route-level circuit breaking for east-west (service-to-service) traffic through the ASMCircuitBreaker CustomResourceDefinition (CRD). You can configure two types of circuit breaking rules:

  • Error rate-based -- Triggers when the percentage of failed responses on a route exceeds a threshold within a time window.

  • Slow request-based -- Triggers when too many requests exceed a response time threshold within a time window.

How circuit breaking works

Each ASM sidecar proxy independently tracks request outcomes for its own traffic. The circuit operates in two states:

  • Closed (normal): Traffic flows freely. The proxy monitors the error rate or slow request count within a sliding time window.

  • Open (tripped): When a threshold is exceeded, the proxy rejects all subsequent requests on that route and returns a configurable custom response (status code, body, and headers). After the break duration expires, the circuit closes and the proxy resumes forwarding requests.

Because each proxy evaluates thresholds independently, different proxies may trip at slightly different times for the same faulty upstream service.

Circuit breaking rules are scoped to individual routes. Rules on different routes operate independently and do not affect each other.

Prerequisites

Before you begin, make sure that you have:

Step 1: Create request path routing between sleep and httpbin

Before configuring circuit breaking, create a VirtualService that defines named routes between the sleep (downstream) and httpbin (upstream) services. The circuit breaking rules target these named routes.

  1. Log on to the ASM console. In the left-side navigation pane, choose Service Mesh > Mesh Management.

  2. Create a virtual service using one of the following methods:

ASM console

  1. On the Mesh Management page, click the name of the ASM instance. In the left-side navigation pane, choose Traffic Management Center > VirtualService. Click Create.

  2. Select a namespace from the Namespace drop-down list and enter a name in the Name field. In the Gateways section, turn on the switch next to Apply To All Sidecars.

  3. In the Hosts section, click Add Host to add the httpbin service.

  4. In the HTTP Route section, click Add Route and configure the parameters as shown in the following figures.

    image

    image

    image

YAML

  1. On the Mesh Management page, click the name of the ASM instance. In the left-side navigation pane, choose Traffic Management Center > VirtualService. Click Create from YAML.

  2. Paste the following YAML into the code editor and click Create.

    Show the YAML content

    apiVersion: networking.istio.io/v1beta1
    kind: VirtualService
    metadata:
      name: httpbin
      namespace: default
    spec:
      hosts:
        - httpbin.default.svc.cluster.local
      http:
        - match:
            - uri:
                exact: /status/500
          name: error-route
          route:
            - destination:
                host: httpbin.default.svc.cluster.local
        - match:
            - uri:
                prefix: /delay
          name: delay-route
          route:
            - destination:
                host: httpbin.default.svc.cluster.local
        - name: default-route
          route:
            - destination:
                host: httpbin.default.svc.cluster.local

This VirtualService defines three named routes:

Request pathMatch typeRoute nameBehavior
/status/500Exact matcherror-routeAlways returns HTTP 500.
/delay/*Prefix matchdelay-routeReturns HTTP 200 after a specified delay. See delay.
/*Any pathdefault-routeDefault route for all other paths.

Step 2: Configure circuit breaking rules

Error rate-based circuit breaking

Error rate-based circuit breaking triggers when the server response error rate on a route exceeds a configured threshold within a time window. Use this type when your upstream service returns an elevated rate of 5xx errors.

  1. Log on to the ASM console. In the left-side navigation pane, choose Service Mesh > Mesh Management.

  2. On the Mesh Management page, click the name of the ASM instance. In the left-side navigation pane, choose Traffic Management Center > Circuit Breaking and Degradation.

  3. Click Create, paste the following YAML into the code editor, and click Create.

    Show the YAML content

    apiVersion: istio.alibabacloud.com/v1beta1
    kind: ASMCircuitBreaker
    metadata:
      name: httpbin-error-circuitbreak
      namespace: default
    spec:
      configs:
        - breaker_config:
            break_duration: 60s
            custom_response:
              body: error break!
              header_to_add:
                x-envoy-overload: 'true'
              status_code: 499
            error_percent:
              value: 60
            min_request_amount: 5
            window_size: 10s
          match:
            vhost:
              name: httpbin.default.svc.cluster.local
              port: 8000
              route:
                name_match: error-route
      workloadSelector:
        labels:
          app: sleep

    The following table describes each parameter:

ParameterDescription
workloadSelector.labelsSelects the downstream service workload whose sidecar proxy enforces the circuit breaking rule. app: sleep targets the sleep service.
break_durationHow long the circuit stays open before the proxy resumes forwarding requests. Set to 60s in this example.
window_sizeThe sliding time window for evaluating the error rate. Set to 10s.
error_percent.valueThe error rate threshold (percentage) that triggers circuit breaking. Set to 60 -- circuit breaking trips if more than 60% of requests fail within the time window.
min_request_amountThe minimum number of requests in the time window before circuit breaking can activate. Set to 5 to prevent false triggers from small sample sizes.
custom_responseThe response returned to callers while the circuit is open. body: error break!. header_to_add: x-envoy-overload: 'true'. status_code: 499.
match.vhost.nameThe domain name of the upstream service. Set to httpbin.default.svc.cluster.local.
match.vhost.portThe service port of the upstream service. Set to 8000.
match.vhost.route.name_matchThe route name from the VirtualService. Set to error-route to apply this rule only to the /status/500 path.

Verify error rate-based circuit breaking

  1. Connect to the ACK cluster with kubectl and send 100 requests to the error-route path:

for i in {1..100};  do kubectl exec -it deploy/sleep -- curl httpbin:8000/status/500 -I | grep 'HTTP';  echo ''; sleep 0.1; done;

Expected output:

Show details

HTTP/1.1 500 Internal Server Error

HTTP/1.1 500 Internal Server Error

HTTP/1.1 500 Internal Server Error

HTTP/1.1 500 Internal Server Error

HTTP/1.1 500 Internal Server Error

HTTP/1.1 499 Unknown

HTTP/1.1 499 Unknown

HTTP/1.1 499 Unknown

...

The first five requests return HTTP 500 from httpbin. Starting from the sixth request, circuit breaking activates and the proxy returns HTTP 499 with the custom response body. The circuit stays open for 60 seconds.

  1. While circuit breaking is active on error-route, verify that other routes remain unaffected:

for i in {1..100};  do kubectl exec -it deploy/sleep -- curl httpbin:8000/status/503 -I | grep 'HTTP';  echo ''; sleep 0.1; done;

Expected output:

Show details

HTTP/1.1 503 Service Unavailable

HTTP/1.1 503 Service Unavailable

HTTP/1.1 503 Service Unavailable

...

Requests to other paths pass through normally. Circuit breaking rules are scoped to the route specified in match.vhost.route.name_match.

Slow request-based circuit breaking

Slow request-based circuit breaking triggers when too many requests take longer than a configured response time threshold within a time window. Use this type when your upstream service experiences latency spikes rather than outright errors.

  1. Log on to the ASM console. In the left-side navigation pane, choose Service Mesh > Mesh Management.

  2. On the Mesh Management page, click the name of the ASM instance. In the left-side navigation pane, choose Traffic Management Center > Circuit Breaking and Degradation.

  3. Click Create, paste the following YAML into the code editor, and click Create.

    Show the YAML content

    apiVersion: istio.alibabacloud.com/v1beta1
    kind: ASMCircuitBreaker
    metadata:
      name: httpbin-error-circuitbreak
      namespace: default
    spec:
      configs:
        - breaker_config:
            break_duration: 60s
            custom_response:
              body: error break!
              header_to_add:
                x-envoy-overload: 'true'
              status_code: 498
            error_percent:
              value: 60
            slow_request_rt: 0.5s
            max_slow_requests: 5
            min_request_amount: 5
            window_size: 10s
          match:
            vhost:
              name: httpbin.default.svc.cluster.local
              port: 8000
              route:
                name_match: delay-route
      workloadSelector:
        labels:
          app: sleep

    The following table describes the parameters specific to slow request-based circuit breaking. For shared parameters (workloadSelector.labels, break_duration, window_size, min_request_amount, custom_response, match.vhost), see the error rate-based parameter reference.

ParameterDescription
slow_request_rtThe response time threshold that defines a slow request. Set to 0.5s -- any request that takes longer than 0.5 seconds counts as slow.
max_slow_requestsThe maximum number of slow requests allowed in the time window before circuit breaking trips. Set to 5.
custom_response.status_codeSet to 498 in this example to distinguish slow request-based circuit breaking from error rate-based circuit breaking (which uses 499).
match.vhost.route.name_matchSet to delay-route to apply this rule to the /delay/* path, where you can control the response delay.

Verify slow request-based circuit breaking

  1. Send requests to the delay-route path with a 1-second delay, exceeding the 0.5s slow_request_rt threshold:

for i in {1..100};  do kubectl exec -it deploy/sleep -- curl httpbin:8000/delay/1 -I | grep 'HTTP';  echo ''; sleep 0.1; done;

Expected output:

Show details

HTTP/1.1 200 OK

HTTP/1.1 200 OK

HTTP/1.1 200 OK

HTTP/1.1 200 OK

HTTP/1.1 200 OK

HTTP/1.1 498 Unknown

HTTP/1.1 498 Unknown

HTTP/1.1 498 Unknown

...

The first five requests succeed with HTTP 200 but each takes over 0.5 seconds, counting as slow requests. Starting from the sixth request, circuit breaking activates and returns HTTP 498. The circuit stays open for 60 seconds.

  1. While slow request-based circuit breaking is active on delay-route, verify that error rate-based circuit breaking on error-route works independently:

for i in {1..100};  do kubectl exec -it deploy/sleep -- curl httpbin:8000/status/500 -I | grep 'HTTP';  echo ''; sleep 0.1; done;

Expected output:

Show details

HTTP/1.1 500 Internal Server Error

HTTP/1.1 500 Internal Server Error

HTTP/1.1 500 Internal Server Error

HTTP/1.1 500 Internal Server Error

HTTP/1.1 500 Internal Server Error

HTTP/1.1 499 Unknown

HTTP/1.1 499 Unknown

HTTP/1.1 499 Unknown

...

Circuit breaking rules on different routes operate independently. You can configure separate rules for routes with different traffic characteristics.

ASMCircuitBreaker parameter reference

The following table consolidates all available parameters for the ASMCircuitBreaker CRD.

ParameterTypeDescription
workloadSelector.labelsMapLabels that select the downstream workload whose sidecar proxy enforces the rule.
match.vhost.nameStringDomain name of the upstream service (e.g., httpbin.default.svc.cluster.local).
match.vhost.portIntegerService port of the upstream service.
match.vhost.route.name_matchStringThe VirtualService route name this rule applies to.
breaker_config.window_sizeDurationSliding time window for evaluating thresholds (e.g., 10s).
breaker_config.min_request_amountIntegerMinimum number of requests in the window before circuit breaking can activate. Prevents false triggers from low traffic.
breaker_config.break_durationDurationHow long the circuit stays open before the proxy resumes forwarding (e.g., 60s).
breaker_config.error_percent.valueIntegerError rate threshold (%) for error rate-based circuit breaking.
breaker_config.slow_request_rtDurationResponse time threshold for slow request-based circuit breaking (e.g., 0.5s).
breaker_config.max_slow_requestsIntegerMaximum slow requests in the window before circuit breaking trips.
breaker_config.custom_response.status_codeIntegerHTTP status code returned while the circuit is open.
breaker_config.custom_response.bodyStringResponse body returned while the circuit is open.
breaker_config.custom_response.header_to_addMapHeaders added to the response while the circuit is open.

Monitor circuit breaking metrics

For ASM instances V1.22.6.28 and later, ASMCircuitBreaker exposes the following Prometheus metric:

MetricTypeDescription
envoy_asm_circuit_breaker_total_broken_requestsCounterTotal number of requests rejected by circuit breaking.

To enable metric collection:

  1. Configure proxyStatsMatcher for the sidecar proxy. Select Regular Expression Match and set the value to .*circuit_breaker.*. See proxyStatsMatcher.

  2. Redeploy the httpbin service to apply the new proxy configuration. See Redeploy workloads.

  3. Configure circuit breaking rules and trigger circuit breaking again by repeating Step 1 and Step 2.

  4. Query the circuit breaking metrics:

kubectl exec -it deploy/httpbin -c istio-proxy -- curl localhost:15090/stats/prometheus|grep asm_circuit_breaker

Expected output:

# TYPE envoy_asm_circuit_breaker_total_broken_requests counter
envoy_asm_circuit_breaker_total_broken_requests{cluster="outbound|8000||httpbin.default.svc.cluster.local",uuid="af7cf7ad-67e8-49c5-b5fe-xxxxxxxxx"} 1430
# TYPE envoy_total_asm_circuit_breakers gauge
envoy_total_asm_circuit_breakers{} 1

Set up Prometheus alerts for circuit breaking

After metric collection is enabled, configure Prometheus alerts to get notified when circuit breaking occurs. The following example uses Managed Service for Prometheus.

  1. Connect the data plane cluster to the Alibaba Cloud ASM component in Managed Service for Prometheus, or upgrade the component to the latest version to start collecting circuit breaking metrics. See Component management.

    If you already collect ASM metrics with a self-managed Prometheus instance, skip this step. See Monitor ASM instances by using a self-managed Prometheus instance.
  2. Create an alert rule for circuit breaking events. See Use a custom PromQL statement to create an alert rule. Configure the following key parameters:

ParameterExampleDescription
Custom PromQL statement(sum by(cluster, namespace) (increase(envoy_asm_circuit_breaker_total_broken_requests[1m]))) > 0Counts requests rejected by circuit breaking in the last minute, grouped by namespace and service. Fires when the count exceeds 0.
Alert messageService-level circuit breaking occurred. Namespace: {{$labels.namespace}}, Service that triggers circuit breaking: {{$labels.cluster}}. The number of requests that are rejected due to circuit breaking within the current one minute: {{ $value }}Includes the namespace, the service that triggered circuit breaking, and the rejection count.