Service Mesh (ASM) provides a new load balancing algorithm, peak exponentially weighted moving average (peak EWMA), in version 1.21. This algorithm calculates the moving average of static weights, latencies, error rates, and other factors to obtain the scores of nodes and then selects suitable nodes for load balancing. When a backend service needs to handle burst traffic, ASM can use the peak EWMA load balancing algorithm to consider the maximum load and real-time response time of backend service pods and flexibly distribute traffic to suitable pods to better handle burst traffic. This topic describes how to configure and use EWMA to implement load balancing based on workload latency.
Background information
ASM provides a variety of common load balancing algorithms, including round robin, least request, and random. These algorithms can meet the requirements of most business scenarios and guarantee certain performance. However, these algorithms can select backend service pods only based on static rules without considering the real-time status and performance of backend service pods.
For example, even if the resources on the host of a backend service pod are occupied by other applications, an ASM instance that uses the default load balancing algorithm still selects this pod rather than other idle pods. As a result, the backend service responds to requests with a higher latency or requests even time out. In this case, if your load balancing algorithm can intelligently ignore this pod with deteriorated performance and route traffic to other idle pods, the overall error rate and the response latency of the application can be significantly reduced.
Prerequisites
An ASM instance whose version is 1.21 or later is created. For more information, see Create an ASM instance.
A Container Service for Kubernetes (ACK) cluster is added to the ASM instance. For more information, see Add a cluster to an ASM instance and Update an ASM instance.
A kubectl client is connected to the cluster. For more information, see Obtain the kubeconfig file of a cluster and use kubectl to connect to the cluster.
Use peak EWMA
Log on to the ASM console. In the left-side navigation pane, choose .
On the Mesh Management page, click the name of the ASM instance. In the left-side navigation pane, choose . On the page that appears, click Create from YAML.
Fill in the following sample code and click Create. The following sample YAML code specifies the
PEAK_EWMA
load balancing algorithm for the simple-server service in the default namespace.apiVersion: networking.istio.io/v1beta1 kind: DestinationRule metadata: name: simple-server namespace: default spec: host: simple-server.default.svc.cluster.local trafficPolicy: loadBalancer: simple: PEAK_EWMA # Uses the PEAK_EWMA load balancing algorithm of ASM.
Example
Description
In this example, the simple-server application is the server. The sleep application functions as the client to send test traffic and the simple-server.default.svc.cluster.local service based on the simple-server application functions as the server. This service has two deployments with different configurations:
simple-server-normal: The response latency of this deployment ranges from 50 ms to 100 ms.
simple-server-hight-latency: The response latency of this deployment ranges from 500 ms to 2000 ms. This deployment is used to simulate the increased latency of some workloads of a service.
Step 1: Enable metric monitoring for the ASM instance
To show the benefits of the peak EWMA load balancing algorithm in a visual manner, we enable metric monitoring for the ASM instance in this example to observe the changes of the overall response latency of the service before and after the peak EWMA load balancing algorithm is enabled. For more information about how to enable metric monitoring and collect metrics to Application Real-Time Monitoring Service (ARMS), see Collect metrics to Managed Service for Prometheus.
Step 2: Deploy the required environment
Use kubectl to connect to the ACK cluster based on the information in the kubeconfig file, and create a sleep.yaml file that contains the following content:
Run the following command to deploy the sleep application:
kubectl apply -f sleep.yaml
Create a simple.yaml file that contains the following content:
Run the following command to deploy the simple-server-normal and simple-server-high-latency applications:
kubectl apply -f simple.yaml
Step 3: Initiate a test with the default load balancing algorithm
The default load balancing algorithm LEAST_REQUEST
is used in the test to generate baseline data.
Run the following command to initiate the test. 100 requests are sent to visit the
/hello
path of the simple-server service:kubectl exec -it deploy/sleep -c sleep -- sh -c 'for i in $(seq 1 100); do time curl simple-server:8080/hello; echo "request $i done"; done'
Expected output:
hello this is port: 8080real 0m 0.06s user 0m 0.00s sys 0m 0.00s request 1 done hello this is port: 8080real 0m 0.09s user 0m 0.00s sys 0m 0.00s request 2 done ...... hello this is port: 8080real 0m 1.72s user 0m 0.00s sys 0m 0.00s request 100 done
After the command is run, click the name of the desired ASM instance on the Mesh Management page. In the left-side navigation pane, choose
. Click the Cloud ASM Istio Service tab and configure the following filter conditions:Namespace:
default
Service:
simple-server.default.svc.cluster.local
Reporter:
destination
Client Workload Namespace:
default
Client Workload:
sleep
Service Workload Namespace:
default
Service Workload:
simple-server-normal + simple-server-high-latency
Click Client Workloads to view the Incoming Request Duration By Source section.
It shows that the P50 response latency of requests from the sleep application to the
simple-server
service is 87.5 ms, and the P95 response latency rises significantly to 2.05 s. This is because the response latency of thesimple-server-high-latency
application is higher, which increases the overall response time of the simple-server service.ImportantThe preceding test results are theoretical values obtained in a controlled experimental environment. Actual results may vary depending on your business environment.
Step 4: Configure the peak EWMA load balancing algorithm and perform a test again
Create a destination rule to configure the peak EWMA load balancing algorithm for the simple-server service.
Use the following YAML content to create a destination rule by referring to the steps in Use peak EWMA:
apiVersion: networking.istio.io/v1beta1 kind: DestinationRule metadata: name: simple-server namespace: default spec: host: simple-server.default.svc.cluster.local trafficPolicy: loadBalancer: simple: PEAK_EWMA
Use kubectl to connect to the ACK cluster based on the information in the kubeconfig file, and run the following command to initiate a test again:
kubectl exec -it deploy/sleep -c sleep -- sh -c 'for i in $(seq 1 100); do time curl simple-server:8080/hello; echo "request $i done"; done'
Observe the test results by referring to Step 3. The results show that the P90, P95, and P99 latencies significantly reduce. This is because the peak EWMA load balancing algorithm finds that the latency of the simple-server-high-latency workload is high and reduces its load balancing weight. As a result, more requests are routed to simple-server-normal with lower latency. From the perspective of the simple-server service, the overall latency of requests is significantly reduced.