Load balancing based on workload latency using exponentially weighted moving average (EWMA) - Alibaba Cloud Service Mesh

Service Mesh (ASM) provides a new load balancing algorithm, peak exponentially weighted moving average (peak EWMA), in version 1.21. This algorithm calculates the moving average of static weights, latencies, error rates, and other factors to obtain the scores of nodes and then selects suitable nodes for load balancing. When a backend service needs to handle burst traffic, ASM can use the peak EWMA load balancing algorithm to consider the maximum load and real-time response time of backend service pods and flexibly distribute traffic to suitable pods to better handle burst traffic. This topic describes how to configure and use EWMA to implement load balancing based on workload latency.

Background information

ASM provides a variety of common load balancing algorithms, including round robin, least request, and random. These algorithms can meet the requirements of most business scenarios and guarantee certain performance. However, these algorithms can select backend service pods only based on static rules without considering the real-time status and performance of backend service pods.

For example, even if the resources on the host of a backend service pod are occupied by other applications, an ASM instance that uses the default load balancing algorithm still selects this pod rather than other idle pods. As a result, the backend service responds to requests with a higher latency or requests even time out. In this case, if your load balancing algorithm can intelligently ignore this pod with deteriorated performance and route traffic to other idle pods, the overall error rate and the response latency of the application can be significantly reduced.

Prerequisites

An ASM instance whose version is 1.21 or later is created. For more information, see Create an ASM instance.
A Container Service for Kubernetes (ACK) cluster is added to the ASM instance. For more information, see Add a cluster to an ASM instance and Update an ASM instance.
A kubectl client is connected to the cluster. For more information, see Obtain the kubeconfig file of a cluster and use kubectl to connect to the cluster.

Use peak EWMA

Log on to the ASM console. In the left-side navigation pane, choose Service Mesh > Mesh Management.
On the Mesh Management page, click the name of the ASM instance. In the left-side navigation pane, choose Traffic Management Center > DestinationRule. On the page that appears, click Create from YAML.

Fill in the following sample code and click Create. The following sample YAML code specifies the PEAK_EWMA load balancing algorithm for the simple-server service in the default namespace.

apiVersion: networking.istio.io/v1beta1
kind: DestinationRule
metadata:
  name: simple-server
  namespace: default
spec:
  host: simple-server.default.svc.cluster.local
  trafficPolicy:
    loadBalancer:
      simple: PEAK_EWMA # Uses the PEAK_EWMA load balancing algorithm of ASM.

Example

Description

In this example, the simple-server application is the server. The sleep application functions as the client to send test traffic and the simple-server.default.svc.cluster.local service based on the simple-server application functions as the server. This service has two deployments with different configurations:

simple-server-normal: The response latency of this deployment ranges from 50 ms to 100 ms.
simple-server-hight-latency: The response latency of this deployment ranges from 500 ms to 2000 ms. This deployment is used to simulate the increased latency of some workloads of a service.

Step 1: Enable metric monitoring for the ASM instance

To show the benefits of the peak EWMA load balancing algorithm in a visual manner, we enable metric monitoring for the ASM instance in this example to observe the changes of the overall response latency of the service before and after the peak EWMA load balancing algorithm is enabled. For more information about how to enable metric monitoring and collect metrics to Application Real-Time Monitoring Service (ARMS), see Collect metrics to Managed Service for Prometheus.

Step 2: Deploy the required environment

Use kubectl to connect to the ACK cluster based on the information in the kubeconfig file, and create a sleep.yaml file that contains the following content:

Show the YAML file

apiVersion: v1
kind: ServiceAccount
metadata:
  name: sleep
---
apiVersion: v1
kind: Service
metadata:
  name: sleep
  labels:
    app: sleep
    service: sleep
spec:
  ports:
  - port: 80
    name: http
  selector:
    app: sleep
---
apiVersion: apps/v1
kind: Deployment
metadata:
  name: sleep
spec:
  replicas: 1
  selector:
    matchLabels:
      app: sleep
  template:
    metadata:
      labels:
        app: sleep
    spec:
      terminationGracePeriodSeconds: 0
      serviceAccountName: sleep
      containers:
      - name: sleep
        image: curlimages/curl
        command: ["/bin/sleep", "infinity"]
        imagePullPolicy: IfNotPresent
        volumeMounts:
        - mountPath: /etc/sleep/tls
          name: secret-volume
      volumes:
      - name: secret-volume
        secret:
          secretName: sleep-secret
          optional: true
---

Run the following command to deploy the sleep application:

kubectl apply -f sleep.yaml

Create a simple.yaml file that contains the following content:

Show the YAML file

apiVersion: v1
kind: Service
metadata:
  name: simple-server
  labels:
    app: simple-server
    service: simple-server
spec:
  ports:
  - port: 8080
    name: http
  selector:
    app: simple-server
---
apiVersion: apps/v1
kind: Deployment
metadata:
  labels:
    app: simple-server
  name: simple-server-normal
  namespace: default
spec:
  progressDeadlineSeconds: 600
  replicas: 1
  revisionHistoryLimit: 10
  selector:
    matchLabels:
      app: simple-server
  strategy:
    rollingUpdate:
      maxSurge: 25%
      maxUnavailable: 25%
    type: RollingUpdate
  template:
    metadata:
      creationTimestamp: null
      labels:
        app: simple-server
    spec:
      containers:
      - args:
        - --delayMin
        - "50"
        - --delayMax
        - "100"
        image: registry-cn-hangzhou.ack.aliyuncs.com/test-public/simple-server:v1.0.0.0-g88293ca-aliyun
        imagePullPolicy: IfNotPresent
        name: simple-server
        ports:
        - containerPort: 80
          protocol: TCP
        resources:
          limits:
            cpu: 500m
        terminationMessagePath: /dev/termination-log
        terminationMessagePolicy: File
      dnsPolicy: ClusterFirst
      restartPolicy: Always
      schedulerName: default-scheduler
      securityContext: {}
      terminationGracePeriodSeconds: 30
---
apiVersion: apps/v1
kind: Deployment
metadata:
  labels:
    app: simple-server
  name: simple-server-high-latency
  namespace: default
spec:
  progressDeadlineSeconds: 600
  replicas: 1
  revisionHistoryLimit: 10
  selector:
    matchLabels:
      app: simple-server
  strategy:
    rollingUpdate:
      maxSurge: 25%
      maxUnavailable: 25%
    type: RollingUpdate
  template:
    metadata:
      creationTimestamp: null
      labels:
        app: simple-server
    spec:
      containers:
      - args:
        - --delayMin
        - "500"
        - --delayMax
        - "2000"
        image: registry-cn-hangzhou.ack.aliyuncs.com/test-public/simple-server:v1.0.0.0-g88293ca-aliyun
        imagePullPolicy: IfNotPresent
        name: simple-server
        ports:
        - containerPort: 80
          protocol: TCP
        resources:
          limits:
            cpu: 500m
        terminationMessagePath: /dev/termination-log
        terminationMessagePolicy: File
      dnsPolicy: ClusterFirst
      restartPolicy: Always
      schedulerName: default-scheduler
      securityContext: {}
      terminationGracePeriodSeconds: 30
---

Run the following command to deploy the simple-server-normal and simple-server-high-latency applications:

kubectl apply -f simple.yaml

Step 3: Initiate a test with the default load balancing algorithm

The default load balancing algorithm LEAST_REQUEST is used in the test to generate baseline data.

Run the following command to initiate the test. 100 requests are sent to visit the /hello path of the simple-server service:

kubectl exec -it deploy/sleep -c sleep --  sh -c 'for i in $(seq 1 100); do time curl simple-server:8080/hello; echo "request $i done"; done'

Expected output:

hello
 this is port: 8080real 0m 0.06s
user    0m 0.00s
sys     0m 0.00s
request 1 done
hello
 this is port: 8080real 0m 0.09s
user    0m 0.00s
sys     0m 0.00s
request 2 done

......

hello
 this is port: 8080real 0m 1.72s
user    0m 0.00s
sys     0m 0.00s
request 100 done

After the command is run, click the name of the desired ASM instance on the Mesh Management page. In the left-side navigation pane, choose Observability Management Center > Monitoring metrics. Click the Cloud ASM Istio Service tab and configure the following filter conditions:
- Namespace: default
- Service: simple-server.default.svc.cluster.local
- Reporter: destination
- Client Workload Namespace: default
- Client Workload: sleep
- Service Workload Namespace: default
- Service Workload: simple-server-normal + simple-server-high-latency
Click Client Workloads to view the Incoming Request Duration By Source section.
It shows that the P50 response latency of requests from the sleep application to the simple-server service is 87.5 ms, and the P95 response latency rises significantly to 2.05 s. This is because the response latency of the simple-server-high-latency application is higher, which increases the overall response time of the simple-server service.
Important
The preceding test results are theoretical values obtained in a controlled experimental environment. Actual results may vary depending on your business environment.

Step 4: Configure the peak EWMA load balancing algorithm and perform a test again

Create a destination rule to configure the peak EWMA load balancing algorithm for the simple-server service.

Use the following YAML content to create a destination rule by referring to the steps in Use peak EWMA:

apiVersion: networking.istio.io/v1beta1
kind: DestinationRule
metadata:
  name: simple-server
  namespace: default
spec:
  host: simple-server.default.svc.cluster.local
  trafficPolicy:
    loadBalancer:
      simple: PEAK_EWMA

Use kubectl to connect to the ACK cluster based on the information in the kubeconfig file, and run the following command to initiate a test again:
```
kubectl exec -it deploy/sleep -c sleep --  sh -c 'for i in $(seq 1 100); do time curl simple-server:8080/hello; echo "request $i done"; done'
```
Observe the test results by referring to Step 3. The results show that the P90, P95, and P99 latencies significantly reduce. This is because the peak EWMA load balancing algorithm finds that the latency of the simple-server-high-latency workload is high and reduces its load balancing weight. As a result, more requests are routed to simple-server-normal with lower latency. From the perspective of the simple-server service, the overall latency of requests is significantly reduced.