All Products
Search
Document Center

Alibaba Cloud Service Mesh:Configure the connectionPool field to implement circuit breaking

Last Updated:Oct 22, 2024

Circuit breaking is a traffic management mechanism used to protect your system from further damage in the event of a system failure or overload. In traditional Java services, frameworks such as Resilience4j can be used to implement circuit breaking. Compared with the traditional approaches, Istio allows you to implement circuit breaking at the network level without integrating circuit breaking into the application code of each service. You can configure the connectionPool field to implement circuit breaking. This improves system stability and reliability and protects desired services from being affected by abnormal requests.

Prerequisites

A Container Service for Kubernetes (ACK) cluster is added to your Service Mesh (ASM) instance. For more information, see Add a cluster to an ASM instance.

connectionPool settings

Before you enable the circuit breaking feature, you must create a destination rule to configure circuit breaking for the desired destination service. For more information about the fields in a destination rule, see Destination Rule.

The connectionPool field defines parameters related to circuit breaking. The following table describes the parameters of the connectionPool field.

Parameter

Type

Required

Description

Default value

tcp.maxConnections

int32

No

The maximum number of HTTP1 or TCP connections to a destination host. The limit on the number of connections takes effect on sidecar proxies on both the client and server sides. A single client pod cannot initiate more than the configured number of connections to the server. A single server cannot accept more than the configured number of connections. The number of connections that can be accepted by the application services on the server is calculated by using the following formula: min(Number of client pods, number of server pods) × maxConnections.

2³²-1

http.http1MaxPendingRequests

int32

No

The maximum number of requests that will be queued when they are waiting for a ready connection pool connection.

1024

http.http2MaxRequests

int32

No

The maximum number of active requests to a backend service.

1024

It is clear how these parameters work in a simple scenario where only one client and one destination service instance exist. In Kubernetes environments, an instance is equivalent to a pod. However, in production environments, we are more likely to see the following scenarios:

  • One client instance and multiple destination service instances

  • Multiple client instances and single destination service instance

  • Multiple client instances and multiple destination service instances

In different scenarios, you need to adjust the values of these parameters based on your business requirements to ensure that the connection pool can adapt to high-load and complex environments and provide good performance and reliability. The following section provides an example on how to configure a connection pool in the preceding scenarios to help you understand the constraints of the configuration on the client and the server. Then, you can configure a circuit breaking policy that applies to your production environment.

Configuration examples

In this topic, two Python scripts are created: one for the destination service (server) and the other for the calling service (client).

  • The server script creates a Flask application and defines a single endpoint on the root route. When you access the root route, the server sleeps for 5 seconds and then returns a "Hello World!" string in JSON format.

    Show the server script

    #!  /usr/bin/env python3
    from flask import Flask
    import time
    
    app = Flask(__name__)
    
    @app.route('/hello')
    def get():
        time.sleep(5)
        return 'hello world!'
    
    if __name__ == '__main__':
        app.run(debug=True, host='0.0.0.0', port='9080', threaded = True)
  • The client script calls the server endpoint by sending 10 requests in parallel at a time, and then sleeps for some time before sending the next batch of 10 requests. The script does this in an infinite loop. To ensure that all of the client pods send a batch of 10 requests at the same time when multiple client pods are running, batches of 10 requests are sent at the 0th, 20th, and 40th second of every minute (according to the system time) in this example.

    Show the client script

    #!  /usr/bin/env python3
    import requests
    import time
    import sys
    from datetime import datetime
    import _thread
    
    def timedisplay(t):
      return t.strftime("%H:%M:%S")
    
    def get(url):
      try:
        stime = datetime.now()
        start = time.time()
        response = requests.get(url)
        etime = datetime.now()
        end = time.time()
        elapsed = end-start
        sys.stderr.write("Status: " + str(response.status_code) + ", Start: " + timedisplay(stime) + ", End: " + timedisplay(etime) + ", Elapsed Time: " + str(elapsed)+"\n")
        sys.stdout.flush()
      except Exception as myexception:
        sys.stderr.write("Exception: " + str(myexception)+"\n")
        sys.stdout.flush()
    
    time.sleep(30)
    
    while True:
      sc = int(datetime.now().strftime('%S'))
      time_range = [0, 20, 40]
    
      if sc not in time_range:
        time.sleep(1)
        continue
    
      sys.stderr.write("\n----------Info----------\n")
      sys.stdout.flush()
    
      # Send 10 requests in parallel
      for i in range(10):
        _thread.start_new_thread(get, ("http://circuit-breaker-sample-server:9080/hello", ))
    
      time.sleep(2)

Deploy sample applications

  1. Create a YAML file that contains the following content and then run the kubectl apply -f ${name of the YAML file}.yaml command to deploy sample applications.

    Show the YAML code

    ##################################################################################################
    #  circuit-breaker-sample-server services
    ##################################################################################################
    apiVersion: v1
    kind: Service
    metadata:
      name: circuit-breaker-sample-server
      labels:
        app: circuit-breaker-sample-server
        service: circuit-breaker-sample-server
    spec:
      ports:
      - port: 9080
        name: http
      selector:
        app: circuit-breaker-sample-server
    ---
    apiVersion: apps/v1
    kind: Deployment
    metadata:
      name: circuit-breaker-sample-server
      labels:
        app: circuit-breaker-sample-server
        version: v1
    spec:
      replicas: 1
      selector:
        matchLabels:
          app: circuit-breaker-sample-server
          version: v1
      template:
        metadata:
          labels:
            app: circuit-breaker-sample-server
            version: v1
        spec:
          containers:
          - name: circuit-breaker-sample-server
            image: registry.cn-hangzhou.aliyuncs.com/acs/istio-samples:circuit-breaker-sample-server.v1
            imagePullPolicy: Always
            ports:
            - containerPort: 9080
    ---
    ##################################################################################################
    #  circuit-breaker-sample-client services
    ##################################################################################################
    apiVersion: apps/v1
    kind: Deployment
    metadata:
      name: circuit-breaker-sample-client
      labels:
        app: circuit-breaker-sample-client
        version: v1
    spec:
      replicas: 1
      selector:
        matchLabels:
          app: circuit-breaker-sample-client
          version: v1
      template:
        metadata:
          labels:
            app: circuit-breaker-sample-client
            version: v1
        spec:
          containers:
          - name: circuit-breaker-sample-client
            image: registry.cn-hangzhou.aliyuncs.com/acs/istio-samples:circuit-breaker-sample-client.v1
            imagePullPolicy: Always
            
  2. Run the following command to view the client and server pods:

    kubectl get po |grep circuit  

    Expected output:

    circuit-breaker-sample-client-d4f64d66d-fwrh4   2/2     Running   0             1m22s
    circuit-breaker-sample-server-6d6ddb4b-gcthv    2/2     Running   0             1m22s

If no limits are defined in the destination rule, the server can handle 10 concurrent requests from the client. Therefore, the response code returned by the server is always 200. The following code block shows the logs of the client:

----------Info----------
Status: 200, Start: 02:39:20, End: 02:39:25, Elapsed Time: 5.016539812088013
Status: 200, Start: 02:39:20, End: 02:39:25, Elapsed Time: 5.012614488601685
Status: 200, Start: 02:39:20, End: 02:39:25, Elapsed Time: 5.015984535217285
Status: 200, Start: 02:39:20, End: 02:39:25, Elapsed Time: 5.015599012374878
Status: 200, Start: 02:39:20, End: 02:39:25, Elapsed Time: 5.012874364852905
Status: 200, Start: 02:39:20, End: 02:39:25, Elapsed Time: 5.018714904785156
Status: 200, Start: 02:39:20, End: 02:39:25, Elapsed Time: 5.010422468185425
Status: 200, Start: 02:39:20, End: 02:39:25, Elapsed Time: 5.012431621551514
Status: 200, Start: 02:39:20, End: 02:39:25, Elapsed Time: 5.011001348495483
Status: 200, Start: 02:39:20, End: 02:39:25, Elapsed Time: 5.01432466506958

Configure the connectionPool field

To enable circuit breaking for a destination service by using the service mesh technology, you need to only define a corresponding destination rule for the destination service.

Use the following content to create a destination rule for the sample destination service. For more information, see Manage destination rules. This destination rule limits the number of TCP connections to the destination service to 5.

apiVersion: networking.istio.io/v1beta1
kind: DestinationRule
metadata:
  name: circuit-breaker-sample-server
spec:
  host: circuit-breaker-sample-server
  trafficPolicy:
    connectionPool:
      tcp:
        maxConnections: 5

Scenario 1: One client pod and one pod for the destination service

  1. Start the client pod and monitor logs.

    We recommend that you restart the client to obtain more intuitive statistical results. You can see the following logs:

    ----------Info----------
    Status: 200, Start: 02:49:40, End: 02:49:45, Elapsed Time: 5.0167787075042725
    Status: 200, Start: 02:49:40, End: 02:49:45, Elapsed Time: 5.011920690536499
    Status: 200, Start: 02:49:40, End: 02:49:45, Elapsed Time: 5.017078161239624
    Status: 200, Start: 02:49:40, End: 02:49:45, Elapsed Time: 5.018405437469482
    Status: 200, Start: 02:49:40, End: 02:49:45, Elapsed Time: 5.018689393997192
    Status: 200, Start: 02:49:40, End: 02:49:50, Elapsed Time: 10.018936395645142
    Status: 200, Start: 02:49:40, End: 02:49:50, Elapsed Time: 10.016417503356934
    Status: 200, Start: 02:49:40, End: 02:49:50, Elapsed Time: 10.019930601119995
    Status: 200, Start: 02:49:40, End: 02:49:50, Elapsed Time: 10.022735834121704
    Status: 200, Start: 02:49:40, End: 02:49:55, Elapsed Time: 15.02303147315979

    The preceding logs show that all the requests are successful. However, only five requests in each batch are responded to in about 5 seconds. The other requests are responded to in 10 or more seconds. It implies that using only tcp.maxConnections results in excess requests being queued. They are waiting for connections to be freed up. By default, the number of requests that can be queued is 2³² - 1.

  2. Use the following content to update the destination rule to allow only one pending request. For more information, see Manage destination rules.

    To realize circuit breaking (fail-fast), you must also set http.http1MaxPendingRequests to limit the number of requests that can be queued. The default value of the http1MaxPendingRequests parameter is 1024. If you set the value to 0, it falls back to the default value. Therefore, you must set the value to at least 1.

    apiVersion: networking.istio.io/v1alpha3
    kind: DestinationRule
    metadata:
      name: circuit-breaker-sample-server
    spec:
      host: circuit-breaker-sample-server
      trafficPolicy:
        connectionPool:
          tcp:
            maxConnections: 5
          http:
            http1MaxPendingRequests: 1
  3. Restart the client pod to obtain correct statistics and monitor logs.

    Sample logs:

    ----------Info----------
    Status: 503, Start: 02:56:40, End: 02:56:40, Elapsed Time: 0.005339622497558594
    Status: 503, Start: 02:56:40, End: 02:56:40, Elapsed Time: 0.007254838943481445
    Status: 503, Start: 02:56:40, End: 02:56:40, Elapsed Time: 0.0044133663177490234
    Status: 503, Start: 02:56:40, End: 02:56:40, Elapsed Time: 0.008964776992797852
    Status: 200, Start: 02:56:40, End: 02:56:45, Elapsed Time: 5.018309116363525
    Status: 200, Start: 02:56:40, End: 02:56:45, Elapsed Time: 5.017424821853638
    Status: 200, Start: 02:56:40, End: 02:56:45, Elapsed Time: 5.019804954528809
    Status: 200, Start: 02:56:40, End: 02:56:45, Elapsed Time: 5.01643180847168
    Status: 200, Start: 02:56:40, End: 02:56:45, Elapsed Time: 5.025975227355957
    Status: 200, Start: 02:56:40, End: 02:56:50, Elapsed Time: 10.01716136932373

    The logs indicate that four requests were immediately throttled, five requests were sent to the destination service, and one request was queued.

  4. Run the following command to view the number of active connections that the Istio proxy of the client establishes with the pod of the destination service:

    kubectl exec $(kubectl get pod --selector app=circuit-breaker-sample-client --output jsonpath='{.items[0].metadata.name}') -c istio-proxy -- curl -X POST http://localhost:15000/clusters | grep circuit-breaker-sample-server | grep cx_active

    Expected output:

    outbound|9080||circuit-breaker-sample-server.default.svc.cluster.local::172.20.192.124:9080::cx_active::5

    The output indicates that five active connections are established between the Istio proxy of the client and the pod of the destination service.

Scenario 2: One client pod and multiple pods for the destination service

This section verifies whether the connection limit is applied at the pod level or the service level. Assume that one client pod and three pods for the destination service exist.

  • If the connection limit is applied at the pod level, each pod of the destination service has a maximum of five connections.

    In this case, no throttling or queuing is observed because the maximum connections allowed is 15 (3 pods multiplied by 5 connections per pod). Because only 10 requests are sent at a time, all requests should succeed and are responded to in about 5 seconds.

  • If the connection limit is applied at the service level, no matter how many pods are running for the destination service, a maximum of five connections are allowed in total.

    In this case, four requests were immediately throttled, five requests were sent to the destination service, and one request was queued.

  1. Run the following command to scale the destination service deployment to three replicas:

    kubectl scale deployment/circuit-breaker-sample-server  --replicas=3
  2. Restart the client pod and monitor logs.

    Sample logs:

    ----------Info----------
    Status: 503, Start: 03:06:20, End: 03:06:20, Elapsed Time: 0.011791706085205078
    Status: 503, Start: 03:06:20, End: 03:06:20, Elapsed Time: 0.0032286643981933594
    Status: 503, Start: 03:06:20, End: 03:06:20, Elapsed Time: 0.012153387069702148
    Status: 503, Start: 03:06:20, End: 03:06:20, Elapsed Time: 0.011871814727783203
    Status: 200, Start: 03:06:20, End: 03:06:25, Elapsed Time: 5.012892484664917
    Status: 200, Start: 03:06:20, End: 03:06:25, Elapsed Time: 5.013102769851685
    Status: 200, Start: 03:06:20, End: 03:06:25, Elapsed Time: 5.016939163208008
    Status: 200, Start: 03:06:20, End: 03:06:25, Elapsed Time: 5.014261484146118
    Status: 200, Start: 03:06:20, End: 03:06:25, Elapsed Time: 5.01246190071106
    Status: 200, Start: 03:06:20, End: 03:06:30, Elapsed Time: 10.021712064743042

    The logs indicate a similar throttling and queuing as shown in the preceding code block, which means increasing the number of instances of the destination service does not increase the connection limit for the client. This indicates that the connection limit is applied at the service level.

  3. Run the following command to view the number of active connections that the Istio proxy of the client establishes with the pods of the destination service:

    kubectl exec $(kubectl get pod --selector app=circuit-breaker-sample-client --output jsonpath='{.items[0].metadata.name}') -c istio-proxy -- curl -X POST http://localhost:15000/clusters | grep circuit-breaker-sample-server | grep cx_active

    Expected output:

    outbound|9080||circuit-breaker-sample-server.default.svc.cluster.local::172.20.192.124:9080::cx_active::2
    outbound|9080||circuit-breaker-sample-server.default.svc.cluster.local::172.20.192.158:9080::cx_active::2
    outbound|9080||circuit-breaker-sample-server.default.svc.cluster.local::172.20.192.26:9080::cx_active::2

    The output indicates that the Istio proxy of the client establishes two active connections with each pod of the destination service. A total of six rather than five connections are established. As mentioned in both Envoy and Istio documentation, a proxy allows some leeway in terms of the number of connections.

Scenario 3: Multiple client pods and one pod for the destination service

  1. Run the following commands to adjust the number of replicas for the destination service and the client:

    kubectl scale deployment/circuit-breaker-sample-server --replicas=1 
    kubectl scale deployment/circuit-breaker-sample-client --replicas=3
  2. Restart the client pod and monitor logs.

    Show the logs of the client

    Client 1
    
    ----------Info----------
    Status: 503, Start: 03:10:40, End: 03:10:40, Elapsed Time: 0.008828878402709961
    Status: 503, Start: 03:10:40, End: 03:10:40, Elapsed Time: 0.010806798934936523
    Status: 503, Start: 03:10:40, End: 03:10:40, Elapsed Time: 0.012855291366577148
    Status: 503, Start: 03:10:40, End: 03:10:40, Elapsed Time: 0.004465818405151367
    Status: 503, Start: 03:10:40, End: 03:10:40, Elapsed Time: 0.007823944091796875
    Status: 503, Start: 03:10:40, End: 03:10:40, Elapsed Time: 0.06221342086791992
    Status: 503, Start: 03:10:40, End: 03:10:40, Elapsed Time: 0.06922149658203125
    Status: 503, Start: 03:10:40, End: 03:10:40, Elapsed Time: 0.06859922409057617
    Status: 200, Start: 03:10:40, End: 03:10:45, Elapsed Time: 5.015282392501831
    Status: 200, Start: 03:10:40, End: 03:10:50, Elapsed Time: 9.378434181213379
    
    Client 2
    
    ----------Info----------
    Status: 503, Start: 03:11:00, End: 03:11:00, Elapsed Time: 0.007795810699462891
    Status: 503, Start: 03:11:00, End: 03:11:00, Elapsed Time: 0.00595545768737793
    Status: 503, Start: 03:11:00, End: 03:11:00, Elapsed Time: 0.013380765914916992
    Status: 503, Start: 03:11:00, End: 03:11:00, Elapsed Time: 0.004278898239135742
    Status: 503, Start: 03:11:00, End: 03:11:00, Elapsed Time: 0.010999202728271484
    Status: 200, Start: 03:11:00, End: 03:11:05, Elapsed Time: 5.015426874160767
    Status: 200, Start: 03:11:00, End: 03:11:05, Elapsed Time: 5.0184690952301025
    Status: 200, Start: 03:11:00, End: 03:11:05, Elapsed Time: 5.019806146621704
    Status: 200, Start: 03:11:00, End: 03:11:05, Elapsed Time: 5.0175628662109375
    Status: 200, Start: 03:11:00, End: 03:11:05, Elapsed Time: 5.031521558761597
    
    Client 3
    
    ----------Info----------
    Status: 503, Start: 03:13:20, End: 03:13:20, Elapsed Time: 0.012019157409667969
    Status: 503, Start: 03:13:20, End: 03:13:20, Elapsed Time: 0.012546539306640625
    Status: 503, Start: 03:13:20, End: 03:13:20, Elapsed Time: 0.013760805130004883
    Status: 503, Start: 03:13:20, End: 03:13:20, Elapsed Time: 0.014089822769165039
    Status: 503, Start: 03:13:20, End: 03:13:20, Elapsed Time: 0.014792442321777344
    Status: 503, Start: 03:13:20, End: 03:13:20, Elapsed Time: 0.015463829040527344
    Status: 503, Start: 03:13:20, End: 03:13:20, Elapsed Time: 0.01661539077758789
    Status: 200, Start: 03:13:20, End: 03:13:20, Elapsed Time: 0.02904224395751953
    Status: 200, Start: 03:13:20, End: 03:13:20, Elapsed Time: 0.03912043571472168
    Status: 200, Start: 03:13:20, End: 03:13:20, Elapsed Time: 0.06436014175415039

    The logs indicate that the number of 503 errors on each client increases. The system allows only five concurrent requests from all the three client pods.

  3. View the logs of the client proxies.

    Show the logs of the client proxies

    {"authority":"circuit-breaker-sample-server:9080","bytes_received":"0","bytes_sent":"81","downstream_local_address":"192.168.142.207:9080","downstream_remote_address":"172.20.192.31:44610","duration":"0","istio_policy_status":"-","method":"GET","path":"/hello","protocol":"HTTP/1.1","request_id":"d9d87600-cd01-421f-8a6f-dc0ee0ac8ccd","requested_server_name":"-","response_code":"503","response_flags":"UO","route_name":"default","start_time":"2023-02-28T03:14:00.095Z","trace_id":"-","upstream_cluster":"outbound|9080||circuit-breaker-sample-server.default.svc.cluster.local","upstream_host":"-","upstream_local_address":"-","upstream_service_time":"-","upstream_transport_failure_reason":"-","user_agent":"python-requests/2.21.0","x_forwarded_for":"-"}
    
    {"authority":"circuit-breaker-sample-server:9080","bytes_received":"0","bytes_sent":"81","downstream_local_address":"192.168.142.207:9080","downstream_remote_address":"172.20.192.31:43294","duration":"58","istio_policy_status":"-","method":"GET","path":"/hello","protocol":"HTTP/1.1","request_id":"931d080a-3413-4e35-91f4-0c906e7ee565","requested_server_name":"-","response_code":"503","response_flags":"URX","route_name":"default","start_time":"2023-02-28T03:12:20.995Z","trace_id":"-","upstream_cluster":"outbound|9080||circuit-breaker-sample-server.default.svc.cluster.local","upstream_host":"172.20.192.84:9080","upstream_local_address":"172.20.192.31:58742","upstream_service_time":"57","upstream_transport_failure_reason":"-","user_agent":"python-requests/2.21.0","x_forwarded_for":"-"}
    

    You can see two different types of logs for the requests that were throttled. The error code 503 is returned for such requests. The logs indicate that the RESPONSE_FLAGS field has two values: UO and URX.

    • UO: indicates upstream overflow (circuit breaking).

    • URX: indicates that the request is rejected because the retry condition for upstream HTTP requests is not met or the maximum number of TCP connection attempts is reached.

    According to the values of other fields in the logs, such as DURATION, UPSTREAM_HOST, and UPSTREAM_CLUSTER, we can further obtain the following conclusion:

    Requests with the UO flag are throttled locally by the client proxies, and requests with the URX flag are rejected by the destination service proxy.

  4. Verify the correctness of the conclusion in the previous step and check the logs of the destination service proxy.

    Show the logs of the destination service proxy

    {"authority":"circuit-breaker-sample-server:9080","bytes_received":"0","bytes_sent":"81","downstream_local_address":"172.20.192.84:9080","downstream_remote_address":"172.20.192.31:59510","duration":"0","istio_policy_status":"-","method":"GET","path":"/hello","protocol":"HTTP/1.1","request_id":"7684cbb0-8f1c-44bf-b591-40c3deff6b0b","requested_server_name":"outbound_.9080_._.circuit-breaker-sample-server.default.svc.cluster.local","response_code":"503","response_flags":"UO","route_name":"default","start_time":"2023-02-28T03:14:00.095Z","trace_id":"-","upstream_cluster":"inbound|9080||","upstream_host":"-","upstream_local_address":"-","upstream_service_time":"-","upstream_transport_failure_reason":"-","user_agent":"python-requests/2.21.0","x_forwarded_for":"-"}
    {"authority":"circuit-breaker-sample-server:9080","bytes_received":"0","bytes_sent":"81","downstream_local_address":"172.20.192.84:9080","downstream_remote_address":"172.20.192.31:58218","duration":"0","istio_policy_status":"-","method":"GET","path":"/hello","protocol":"HTTP/1.1","request_id":"2aa351fa-349d-4283-a5ea-dc74ecbdff8c","requested_server_name":"outbound_.9080_._.circuit-breaker-sample-server.default.svc.cluster.local","response_code":"503","response_flags":"UO","route_name":"default","start_time":"2023-02-28T03:12:20.996Z","trace_id":"-","upstream_cluster":"inbound|9080||","upstream_host":"-","upstream_local_address":"-","upstream_service_time":"-","upstream_transport_failure_reason":"-","user_agent":"python-requests/2.21.0","x_forwarded_for":"-"}

    As expected, the response code 503 appears in the logs of the destination service proxy. That is the reason why the logs of the client proxies contain "response_code":"503" and "response_flags":"URX".

In summary, the client proxies send requests according to the limit that up to five connections are allowed for each pod, and throttle or queue excess requests by using the UO response flag. All three client proxies can send up to 15 parallel requests at the start of a batch. However, only five requests can be successfully sent because the destination service proxy also limits the number of connections to five. The destination service proxy accepts only five requests and throttles the rest. The throttled requests are marked by the URX response flag in the logs of the client proxies.

The following figure shows how requests are sent from multiple client pods to a single destination service pod in the preceding scenario.

image

Scenario 4: Multiple pods for both the client and the destination service

When you increase the number of replicas of the destination service, the overall success rate of requests rises because each destination service proxy allows five parallel requests. In this way, throttling on both the client proxies and the destination service proxies can be observed.

  1. Run the following commands to increase the number of replicas of the destination service to 2 and the number of replicas of the client to 3:

    kubectl scale deployment/circuit-breaker-sample-server --replicas=2
    kubectl scale deployment/circuit-breaker-sample-client --replicas=3

    You can see that 10 requests are successful out of the 30 requests generated by all 3 client proxies in a batch.

  2. Run the following command to increase the number of replicas of the destination service to 3:

    kubectl scale deployment/circuit-breaker-sample-server --replicas=3

    You can see that 15 requests are successful.

  3. Run the following command to increase the number of replicas of the destination service to 4:

    kubectl scale deployment/circuit-breaker-sample-server --replicas=3

    After the number of replicas of the destination service is increased from 3 to 4, you still see only 15 successful requests. The limit on client proxies applies to the entire destination service regardless of the number of replicas that the destination service has. Therefore, regardless of the number of replicas that the destination service has, each client proxy can send a maximum of five concurrent requests to the destination service.

Related operations

View metrics related to circuit breaking of connection pools

Circuit breaking of connection pools is implemented by limiting the maximum number of TCP connections to a destination host. When circuit breaking occurs, a series of related metrics are generated. These metrics help you determine whether circuit breaking occurs. The following table describes some metrics.

Metric

Type

Description

envoy_cluster_circuit_breakers_default_cx_open

Gauge

Indicates whether circuit breaking is triggered for a connection pool. The value 1 indicates that circuit breaking is triggered. The value 0 indicates that circuit breaking is not triggered.

envoy_cluster_circuit_breakers_default_rq_pending_open

Gauge

Indicates whether the number of requests that will be queued when they are waiting for a ready connection pool connection has exceeded the given value. The value is 1 if the number of requests has exceeded the given value. The value is 0 if the number of requests has not exceeded the given value.

You can configure proxyStatsMatcher for a sidecar proxy to enable the sidecar proxy to report metrics related to circuit breaking. Then, you can use Prometheus to collect and view the metrics.

  1. Configure proxyStatsMatcher to enable a sidecar proxy to report metrics related to circuit breaking. After you select proxyStatsMatcher, select Regular Expression Match and set the value to .*circuit_breaker.*. For more information, see proxyStatsMatcher.

  2. Redeploy the Deployments for circuit-breaker-sample-server and circuit-breaker-sample-client. For more information, see Redeploy workloads.

  3. Complete the circuit breaking configuration of connection pools and perform request tests by following the preceding steps.

  4. Run the following command to view the metrics related to circuit breaking of the connection pool for the circuit-breaker-sample-client service:

    kubectl exec -it deploy/circuit-breaker-sample-client -c istio-proxy -- curl localhost:15090/stats/prometheus | grep circuit_breaker | grep circuit-breaker-sample-server

    Expected output:

    kubectl exec -it deploy/circuit-breaker-sample-client -c istio-proxy -- curl localhost:15090/stats/prometheus | grep circuit_breaker | grep circuit-breaker-sample-server
    envoy_cluster_circuit_breakers_default_cx_open{cluster_name="outbound|9080||circuit-breaker-sample-server.default.svc.cluster.local"} 1
    envoy_cluster_circuit_breakers_default_cx_pool_open{cluster_name="outbound|9080||circuit-breaker-sample-server.default.svc.cluster.local"} 0
    envoy_cluster_circuit_breakers_default_remaining_cx{cluster_name="outbound|9080||circuit-breaker-sample-server.default.svc.cluster.local"} 0
    envoy_cluster_circuit_breakers_default_remaining_cx_pools{cluster_name="outbound|9080||circuit-breaker-sample-server.default.svc.cluster.local"} 18446744073709551613
    envoy_cluster_circuit_breakers_default_remaining_pending{cluster_name="outbound|9080||circuit-breaker-sample-server.default.svc.cluster.local"} 1
    envoy_cluster_circuit_breakers_default_remaining_retries{cluster_name="outbound|9080||circuit-breaker-sample-server.default.svc.cluster.local"} 4294967295
    envoy_cluster_circuit_breakers_default_remaining_rq{cluster_name="outbound|9080||circuit-breaker-sample-server.default.svc.cluster.local"} 4294967295
    envoy_cluster_circuit_breakers_default_rq_open{cluster_name="outbound|9080||circuit-breaker-sample-server.default.svc.cluster.local"} 0
    envoy_cluster_circuit_breakers_default_rq_pending_open{cluster_name="outbound|9080||circuit-breaker-sample-server.default.svc.cluster.local"} 0
    envoy_cluster_circuit_breakers_default_rq_retry_open{cluster_name="outbound|9080||circuit-breaker-sample-server.default.svc.cluster.local"} 0
    envoy_cluster_circuit_breakers_high_cx_open{cluster_name="outbound|9080||circuit-breaker-sample-server.default.svc.cluster.local"} 0
    envoy_cluster_circuit_breakers_high_cx_pool_open{cluster_name="outbound|9080||circuit-breaker-sample-server.default.svc.cluster.local"} 0
    envoy_cluster_circuit_breakers_high_rq_open{cluster_name="outbound|9080||circuit-breaker-sample-server.default.svc.cluster.local"} 0
    envoy_cluster_circuit_breakers_high_rq_pending_open{cluster_name="outbound|9080||circuit-breaker-sample-server.default.svc.cluster.local"} 0
    envoy_cluster_circuit_breakers_high_rq_retry_open{cluster_name="outbound|9080||circuit-breaker-sample-server.default.svc.cluster.local"} 0

Configure metric collection and alerts for circuit breaking of connection pools

After you configure metrics related to circuit breaking of connection pools, you can configure settings to collect the metrics to Prometheus and configure alert rules based on key metrics. This way, alerts can be generated when circuit breaking occurs. The following section demonstrates how to configure metric collection and alerts for circuit breaking of connection pools. In this example, Managed Service for Prometheus is used.

  1. In Managed Service for Prometheus, you can connect the cluster on the data plane to the Alibaba Cloud ASM component or upgrade the component to the latest version. This ensures that the exposed metrics related to circuit breaking can be collected by Managed Service for Prometheus. For more information about how to integrate components into ARMS, see Component management. (If you have configured a self-managed Prometheus instance to collect metrics of an ASM instance by referring to Monitor ASM instances by using a self-managed Prometheus instance, you do not need to perform this step.)

  2. Create an alert rule for circuit breaking of connection pools. For more information, see Use a custom PromQL statement to create an alert rule. The following example demonstrates how to specify key parameters for configuring an alert rule. For more information about how to configure other parameters, see the preceding documentation.

Parameter

Example

Description

Custom PromQL Statements

(sum by(cluster_name, pod_name,namespace) (envoy_cluster_circuit_breakers_default_cx_open)) != 0

In the example, the envoy_cluster_circuit_breakers_default_cx_open metric is queried to determine whether circuit breaking is occurring in connection pools of the current cluster. Based on the hostname of the upstream service and the name of the pod that reports the metric, you can determine the location where circuit breaking occurs.

Alert Message

Circuit breaking occurs for a connection pool. The number of TCP connections established by the sidecar proxy has reached the upper limit. Namespace: {{$labels.namespace}}, Pod in which circuit breaking occurs for a connection pool:{{$labels.pod_name}}, Information about the upstream service: {{ $labels.cluster_name }}

The alert information in the example indicates the pod in which circuit breaking for a connection pool occurs, the upstream service to which the pod connects, and the namespace to which the pod belongs.

Constraints on connection pool configurations

The following table describes the constraints on the configurations of the connectionPool field on the client and the destination service.

Role

Description

Client

Each client proxy implements the limit independently. If the limit on the number of requests is 100, each client proxy can have 100 outstanding requests before local throttling is applied. If N clients call the destination service, the maximum number of outstanding requests that are supported is the product of 100 and N.

The limit on client proxies applies to the entire destination service, not to a single replica of the destination service. Even if the destination service runs in 200 active pods, a maximum of 100 requests are allowed.

Destination service

The limit applies to each destination service proxy. If the service runs in 50 active pods, each pod can have up to 100 outstanding requests sent from client proxies before throttling is triggered and the response code 503 is returned.