All Products
Search
Document Center

Alibaba Cloud Service Mesh:Use ConcurrencySchedulingPolicy to implement priority-based request scheduling under controlled concurrency

Last Updated:Sep 04, 2024

This topic describes how to use ConcurrencySchedulingPolicy provided by the traffic scheduling suite to implement priority-based request scheduling under controlled concurrency.

Background information

ConcurrencySchedulingPolicy determines whether traffic is overloaded based on the limits on concurrent requests. When the number of concurrent requests exceeds the specified upper limit, subsequent requests are queued and scheduled based on their priorities. ConcurrencySchedulingPolicy works in the following way:

  • ConcurrencySchedulingPolicy uses a concurrency limiter to record the number of concurrent requests that are being processed and determines whether the number of concurrent requests reaches the upper limit.

  • When the number of concurrent requests reaches the upper limit, subsequent requests are queued and sent to the destination service after the previous requests are processed. This ensures that a specific number of concurrent requests are maintained. In addition, high-priority requests have a greater chance of being taken out of the queue and sent to the destination service.

When the number of concurrent requests exceeds the upper limit of requests that the system can process, ConcurrencySchedulingPolicy does not directly reject requests but puts them in a priority queue. This differs from ConcurrencyLimitingPolicy that rejects requests when the specified threshold is reached. ConcurrencySchedulingPolicy schedules requests based on their priorities while ensuring that the number of concurrent requests is within the upper limit.

Prerequisites

Step 1: Create ConcurrencySchedulingPolicy

  1. Use kubectl to connect to your ASM instance. For more information, see Use kubectl on the control plane to access Istio resources.

  2. Create a concurrencyschedulingpoilcy.yaml file that contains the following content:

    apiVersion: istio.alibabacloud.com/v1
    kind: ConcurrencySchedulingPolicy
    metadata:
      name: concurrencyscheduling
      namespace: istio-system
    spec:
      concurrency_scheduler:
        max_concurrency: 10
        concurrency_limiter:
          max_inflight_duration: 60s
        scheduler:
          workloads:
            - label_matcher:
                match_labels:
                  http.request.header.user_type: guest
              parameters:
                priority: 50.0
              name: guest
            - label_matcher:
                match_labels:
                  http.request.header.user_type: subscriber
              parameters:
                priority: 200.0
              name: subscriber
        selectors:
        - service: httpbin.default.svc.cluster.local

    The following table describes some of the parameters. For more information about configuration items, see Description of ConcurrencySchedulingPolicy fields.

    Field

    Description

    max_concurrency

    The maximum number of concurrent requests. In this example, this field is set to 10, which indicates that the service is allowed to process 10 requests at a time.

    max_inflight_duration

    The timeout period for request processing. Due to unexpected events such as the restart of pods in the cluster, the ASM traffic scheduling suite may fail to record request termination events. To prevent such requests from affecting the judgment of the concurrency limiting algorithm, you need to specify the timeout period for request processing. If requests have not been responded to before this timeout period, the system considers that such requests have been processed. You can set this field by evaluating the expected maximum response time of a request. In this example, this field is set to 60s.

    workloads

    Two types of requests are defined based on user_type in the request headers: guest and subscriber. The priority of a request of the guest type is 50, and that of a request of the subscriber type is 200.

    selectors

    The services to which the throttling policy is applied. In this example, service: httpbin.default.svc.cluster.local is used, which indicates that the concurrency limiting policy is applied to the httpbin.default.svc.cluster.local service.

  3. Run the following command to create ConcurrencySchedulingPolicy:

    kubectl apply -f concurrencyschedulingpoilcy.yaml

Step 2: Verify the result of the priority-based request scheduling in scenarios where concurrent requests are controlled

In this example, the stress testing tool Fortio is used. For more information, see the Installation section of Fortio on the GitHub website.

  1. Open two terminals and run the following two stress testing commands at the same time to start testing. During the entire testing, make sure that the two terminals work as expected. In the tests on the two terminals, 10 concurrent requests are sent to the service and the queries per second (QPS) is 10,000, which significantly exceeds the concurrent requests that the service can bear.

    fortio load -c 10 -qps 10000  -H "user_type:guest" -t 30s -timeout 60s -a http://${IP address of the ASM ingress gateway}/status/201
    fortio load -c 10 -qps 10000  -H "user_type:subscriber" -t 30s -timeout 60s -a http://${IP address of the ASM ingress gateway}/status/202
    Note

    Replace ${IP address of the ASM gateway} in the preceding commands with the IP address of your ASM ingress gateway. For more information about how to obtain the IP address of the ASM ingress gateway, see substep 1 of Step 3 in the Use Istio resources to route traffic to different versions of a service topic.

    Expected output from test 1:

    ...
    # target 50% 4.35294
    # target 75% 5.39689
    # target 90% 5.89697
    # target 99% 6.19701
    # target 99.9% 6.22702
    Sockets used: 10 (for perfect keepalive, would be 10)
    Uniform: false, Jitter: false
    Code 201 : 84 (100.0 %)
    Response Header Sizes : count 84 avg 249.88095 +/- 0.3587 min 248 max 250 sum 20990
    Response Body/Total Sizes : count 84 avg 249.88095 +/- 0.3587 min 248 max 250 sum 20990
    All done 84 calls (plus 10 warmup) 3802.559 ms avg, 2.6 qps
    Successfully wrote 5186 bytes of Json data to xxxxxx.json

    Record the name of the JSON file output by test 1, for example, xxxxxx.json.

    Expected output from test 2:

    ...
    # target 50% 1.18121
    # target 75% 1.63423
    # target 90% 1.90604
    # target 99% 2.22941
    # target 99.9% 2.28353
    Sockets used: 10 (for perfect keepalive, would be 10)
    Uniform: false, Jitter: false
    Code 202 : 270 (100.0 %)
    Response Header Sizes : count 270 avg 250.52963 +/- 0.5418 min 249 max 251 sum 67643
    Response Body/Total Sizes : count 270 avg 250.52963 +/- 0.5418 min 249 max 251 sum 67643
    All done 270 calls (plus 10 warmup) 1117.614 ms avg, 8.8 qps
    Successfully wrote 5305 bytes of Json data to yyyyyy.json

    Record the name of the JSON file output by test 2, for example, yyyyyy.json.

    The preceding outputs show that the average request latency of test 2 is about 1/4 times that of test 1, and the QPS is about four times that of test 1. This is because in the previously defined policy, the priority of the requests of the subscriber type is four times that of requests of the guest type.

  2. (Optional) View the results in a visualized manner.

    1. Run the following command in the on-premises directory where the two commands were executed in the previous step to open the local Fortio server:

      fortio server
    2. Use a browser to access http://localhost:8080/fortio/browse, and click the name of the corresponding JSON file that you record in substep 1 to view the visualized test results.

      Example of visualized results for test 1:

      image

      Example of visualized results for test 2:

      image

      The preceding visualized results show that except for a few unrestricted requests, most requests of the guest type have a latency of 4,000-6,000 ms. However, most requests of the subscriber type have a latency of 1,000-2,000 ms. When requests to the service exceed the upper limit, requests of the subscriber type are responded to first. In addition, concurrent requests sent to the service are limited to a specific value.