All Products
Search
Document Center

Alibaba Cloud Service Mesh:Use QuotaSchedulingPolicy to implement request call quota management

Last Updated:Sep 04, 2024

The QuotaSchedulingPolicy CustomResourceDefinition (CRD) provided by the Service Mesh (ASM) traffic scheduling suite supports priority-based request scheduling after the specified request call quota is reached. When the rate of requests being processed in the system exceeds the specified quota, subsequent requests will be queued and requests with higher priorities will be processed faster. This topic describes how to use QuotaSchedulingPolicy provided by the traffic scheduling suite to implement request call quota management.

Background information

QuotaSchedulingPolicy uses the token bucket algorithm to control the rate of requests to call a specified service and queues requests when the rate exceeds the specified quota. QuotaSchedulingPolicy works in the following way:

  1. A rate limiter that uses the token bucket algorithm is used to limit the request rate. For more information about the algorithm implementation of the rate limiter, see the Background information section in Use RateLimitingPolicy to implement user-specific throttling.

  2. After the request rate exceeds the quota limit, subsequent requests are queued and sent to the destination service after the previous requests are processed. This ensures that the request rate is always maintained at the specified value. In addition, high-priority requests have a greater chance of being taken out of the queue and sent to the destination service.

Different from the throttling scenario, when you use QuotaSchedulingPolicy, if the request rate exceeds the quota limit, QuotaSchedulingPolicy does not directly reject requests but puts them in a priority queue. QuotaSchedulingPolicy schedules requests based on their priorities while ensuring that the request rate is within the quota limit.

Prerequisites

Step 1: Create QuotaSchedulingPolicy

  1. Use kubectl to connect to the ASM instance. For more information, see Use kubectl on the control plane to access Istio resources.

  2. Create a quotaschedulingpolicy.yaml file that contains the following content:

    apiVersion: istio.alibabacloud.com/v1
    kind: QuotaSchedulingPolicy
    metadata:
      name: quotascheduling
      namespace: istio-system
    spec:
      quota_scheduler:
        bucket_capacity: 10
        fill_amount: 10
        rate_limiter:
          interval: 1s
        scheduler:
          workloads:
            - label_matcher:
                match_labels:
                  http.request.header.user_type: guest
              parameters:
                priority: 50.0
              name: guest
            - label_matcher:
                match_labels:
                  http.request.header.user_type: subscriber
              parameters:
                priority: 200.0
              name: subscriber
        selectors:
        - service: httpbin.default.svc.cluster.local

    The following table describes some of the fields. For more information about the related fields, see Description of QuotaSchedulingPolicy fields.

    Field

    Description

    fill_amount

    The number of tokens to be added within the time interval specified by the interval field. In this example, the value is 10, which indicates that the token bucket is filled with 10 tokens after each interval specified by the interval field.

    interval

    The interval at which tokens are added to the token bucket. In this example, the value is 1s, which indicates that the token bucket is filled with 10 tokens every 1 second.

    bucket_capacity

    The maximum number of tokens in the token bucket. When the request rate is lower than the token bucket filling rate, the number of tokens in the token bucket will continue to increase until the maximum number, bucket_capacity, is reached. bucket_capacity is used to allow a certain degree of burst traffic. In this example, the value is 10, which is the same as the value of the fill_amount field. In this case, no burst traffic is allowed.

    workloads

    Two types of requests are defined based on user_type in the request headers: guest and subscriber. The priority of a request of the guest type is 50, and that of a request of the subscriber type is 200.

    selectors

    The services to which the throttling policy is applied. In this example, the httpbin.default.svc.cluster.local service is used, which indicates that concurrency limiting is performed on the httpbin.default.svc.cluster.local service.

  3. Run the following command to create QuotaSchedulingPolicy:

kubectl apply -f quotaschedulingpolicy.yaml

Step 2: Verify whether QuotaSchedulingPolicy takes effect

In this example, the stress testing tool Fortio is used. For more information, see the Installation section of Fortio on the GitHub website.

  1. Open two terminals and run the following two stress testing commands at the same time to start testing. During the entire testing, make sure that the two terminals work as expected. In the tests on the two terminals, 10 concurrent requests are sent to the service and the queries per second (QPS) is 10,000, which significantly exceeds the concurrent requests that the service can bear.

    fortio load -c 10 -qps 10000  -H "user_type:guest" -t 30s -timeout 60s -a http://${IP address of the ASM ingress gateway}/status/201
    fortio load -c 10 -qps 10000  -H "user_type:subscriber" -t 30s -timeout 60s -a http://${IP address of the ASM ingress gateway}/status/202
    Note

    Replace ${IP address of the ASM ingress gateway} in the preceding commands with the IP address of your ASM ingress gateway. For more information about how to obtain the IP address of the ASM ingress gateway, see substep 1 of Step 3 in the Use Istio resources to route traffic to different versions of a service topic.

    Expected output from test 1:

    ...
    # target 50% 4.83333
    # target 75% 5.20763
    # target 90% 5.38203
    # target 99% 5.48668
    # target 99.9% 5.49714
    Sockets used: 10 (for perfect keepalive, would be 10)
    Uniform: false, Jitter: false
    Code 201 : 70 (100.0 %)
    Response Header Sizes : count 70 avg 249.94286 +/- 0.2871 min 248 max 250 sum 17496
    Response Body/Total Sizes : count 70 avg 249.94286 +/- 0.2871 min 248 max 250 sum 17496
    All done 70 calls (plus 10 warmup) 4566.839 ms avg, 2.1 qps
    Successfully wrote 4693 bytes of Json data to 2024-07-26-232250_114_55_5_155_status_201_iZbp1cz9ur77robaiv085tZ.json

    Expected output from test 2:

    fortio load -c 10 -qps 10000  -H "user_type:subscriber" -t 30s -timeout 60s -a http://114.55.X.XXX/status/202
    ...
    # target 50% 0.253333
    # target 75% 1.875
    # target 90% 4.26635
    # target 99% 4.47301
    # target 99.9% 4.49367
    Sockets used: 10 (for perfect keepalive, would be 10)
    Uniform: false, Jitter: false
    Code 202 : 250 (100.0 %)
    Response Header Sizes : count 250 avg 250.264 +/- 0.4408 min 250 max 251 sum 62566
    Response Body/Total Sizes : count 250 avg 250.264 +/- 0.4408 min 250 max 251 sum 62566
    All done 250 calls (plus 10 warmup) 1226.657 ms avg, 8.0 qps
    Successfully wrote 4509 bytes of Json data to 2024-07-26-232250_114_55_5_155_status_202_iZbp1cz9ur77robaiv085tZ.json

    The preceding outputs show that the average request latency of test 2 is about 1/4 times that of test 1, and the QPS is about four times that of test 1. This is because in the previously defined policy, the priority of the requests of the subscriber type is four times that of requests of the guest type. A total of 320 requests are processed within 30 seconds in two tests. Excluding the 20 requests used for warm-up, the request rate received by the service is exactly 10 requests per second. This proves that the requests received by the service are always within the given limit.