The traffic scheduling suite of Service Mesh (ASM) supports priority-based request scheduling policies. When the system is overloaded, high-priority requests can be processed preferentially. This topic describes how to use AverageLatencySchedulingPolicy provided by the traffic scheduling suite to implement priority-based request scheduling.
Background information
A priority-based request scheduling policy compares the real-time latency with the historical average latency to determine whether traffic overload occurs. If traffic overload occurs, scheduling mechanisms based on token bucket and priority are used to schedule requests. The priority-based request scheduling policy works in the following way:
Overload detection: This policy compares the average latency in the past period with the current latency to determine whether the system is overloaded.
Adjustment of the token issuance rate: If an overload occurs, the monitoring data obtained in the previous step is sent to a dedicated controller, which will control the fill rate of the token bucket.
Request scheduling: Different requests have different priorities. When an overload occurs, requests with higher priorities have a greater chance of obtaining tokens.
This policy can be used to queue requests when services are congested due to high request concurrency and response latencies continue to increase. Different from the commonly used throttling policies, this policy does not directly reject requests but put them in a priority queue. The request rate is limited by using the token bucket mechanism, and the request processing order is adjusted according to request priorities.
Prerequisites
A Container Service for Kubernetes (ACK) managed cluster is added to your ASM instance, and the version of your ASM instance is V1.21.6.44 or later. For more information, see Add a cluster to an ASM instance.
Automatic sidecar proxy injection is enabled for the default namespace in the ACK cluster. For more information, see Manage global namespaces.
You have connected to the ACK cluster by using kubectl. For more information, see Obtain the kubeconfig file of a cluster and use kubectl to connect to the cluster.
The ASM traffic scheduling suite is enabled. For more information, see Enable the ASM traffic scheduling suite.
The HTTPBin application is deployed and can be accessed by using an ASM gateway. For more information, see Deploy the HTTPBin application.
(Optional) The features of generating and collecting access logs of the ASM gateway are enabled. For more information, see Configure the features of generating and collecting the access logs of an ASM gateway.
Step 1: Create AverageLatencySchedulingPolicy
Use kubectl to connect to the ASM instance. For more information, see Use kubectl on the control plane to access Istio resources.
Create an AverageLatencySchedulingPolicy.yaml file that contains the following content:
apiVersion: istio.alibabacloud.com/v1 kind: AverageLatencySchedulingPolicy metadata: name: workload-prioritization namespace: istio-system spec: load_scheduling_core: aimd_load_scheduler: load_scheduler: workload_latency_based_tokens: true selectors: - service: httpbin.default.svc.cluster.local scheduler: workloads: - label_matcher: match_labels: http.request.header.user_type: "guest" parameters: priority: 50.0 name: "guest" - label_matcher: match_labels: http.request.header.user_type: "subscriber" parameters: priority: 200.0 name: "subscriber"
The following table describes some of the fields.
Field
Description
workload_latency_based_tokens
Indicates whether to dynamically adjust the number of tokens based on the average latency of the workload. The time window for estimating the average latency is the last 30 minutes.
service
The service on which the scheduling policy takes effect.
workloads
Two types of requests are defined based on user_type in the request headers: guest and subscriber. The priority of a request of the guest type is 50, and that of a request of the subscriber type is 200.
For more information about the fields supported by AverageLatencySchedulingPolicy, see Description of AverageLatencySchedulingPolicy fields.
Step 2: Perform tests
In this example, the load testing tool fortio is used. For more information, see Install fortio.
First, some normal service requests are simulated to generate the average latency of requests:
fortio load -c 20 -qps 100000 -t 60m http://${IP address of the ASM gateway}/status/200
Three minutes later after the preceding command is executed, reopen two terminals and send test requests. During the entire test, make sure that the fortio process is always running and do not close the corresponding terminal.
In the two terminals, run the following two load testing commands respectively. (Start running the two tests at the same time as much as possible.)
fortio load -c 40 -qps 100000 -H "user_type:guest" -t 3m http://${IP address of the ASM gateway}/status/201
fortio load -c 40 -qps 100000 -H "user_type:subscriber" -t 3m http://${IP address of the ASM gateway}/status/202
In the two commands, different request paths are used to facilitate observation of test results based on access logs.
Step 3: Analyze the test results
The following code blocks show the last few lines of the outputs of the two commands in the current test environment:
Code 201 : 26852 (97.8 %)
Code 503 : 601 (2.2 %)
Response Header Sizes : count 27453 avg 242.91564 +/- 36.35 min 0 max 249 sum 6668763
Response Body/Total Sizes : count 27453 avg 246.17754 +/- 14.56 min 149 max 249 sum 6758312
All done 27453 calls (plus 40 warmup) 262.318 ms avg, 152.4 qps
Code 202 : 52765 (100.0 %)
Response Header Sizes : count 52765 avg 248.86358 +/- 0.5951 min 248 max 250 sum 13131287
Response Body/Total Sizes : count 52765 avg 248.86358 +/- 0.5951 min 248 max 250 sum 13131287
All done 52765 calls (plus 40 warmup) 136.472 ms avg, 292.9 qps
The code blocks indicate that:
All requests from the subscriber user were successful, and no HTTP 503 status code was returned. HTTP 503 status code was returned for 2.2% of the requests from the guest user.
The average latency of requests from the subscriber user is about 136 ms. The average latency of requests from the guest user is about 262 ms. In addition, the queries per second (QPS) of the two users differs significantly.
It can be inferred that AverageLatencySchedulingPolicy preferentially processes the requests of the subscriber user when the request latency increases.
The test results in this topic are only theoretical values. The actual data is subject to your operating environment. The load test results vary in different test environments. Only the relative relationship of the data is analyzed in this example.
(Optional) Step 4: Analyze the test results based on the access logs of the ASM gateway
Log on to the ASM console. In the left-side navigation pane, choose .
On the Mesh Management page, click the name of the ASM instance. In the left-side navigation pane, choose .
Click the name of the ASM gateway to go to the Gateway overview page. Then, click Gateway Logs to view the access logs of the ASM gateway.
The access paths of the guest and subscriber users are different. You can use the access paths to retrieve the access results of the two users respectively:
References
You can verify whether AverageLatencySchedulingPolicy takes effect on Grafana. You need to ensure that the Prometheus instance for Grafana has been configured with ASM traffic scheduling suite.
You can import the following content into Grafana to create a dashboard for AverageLatencySchedulingPolicy.
The dashboard is as follows.