All Products
Search
Document Center

Alibaba Cloud Service Mesh:Load balancing based on workload latency using EWMA

Last Updated:Oct 08, 2024

Service Mesh (ASM) provides a new load balancing algorithm, peak exponentially weighted moving average (peak EWMA), in version 1.21. This algorithm calculates the moving average of static weights, latencies, error rates, and other factors to obtain the scores of nodes and then selects suitable nodes for load balancing. When a backend service needs to handle burst traffic, ASM can use the peak EWMA load balancing algorithm to consider the maximum load and real-time response time of backend service pods and flexibly distribute traffic to suitable pods to better handle burst traffic. This topic describes how to configure and use EWMA to implement load balancing based on workload latency.

Background information

ASM provides a variety of common load balancing algorithms, including round robin, least request, and random. These algorithms can meet the requirements of most business scenarios and guarantee certain performance. However, these algorithms can select backend service pods only based on static rules without considering the real-time status and performance of backend service pods.

For example, even if the resources on the host of a backend service pod are occupied by other applications, an ASM instance that uses the default load balancing algorithm still selects this pod rather than other idle pods. As a result, the backend service responds to requests with a higher latency or requests even time out. In this case, if your load balancing algorithm can intelligently ignore this pod with deteriorated performance and route traffic to other idle pods, the overall error rate and the response latency of the application can be significantly reduced.

Prerequisites

Use peak EWMA

  1. Log on to the ASM console. In the left-side navigation pane, choose Service Mesh > Mesh Management.

  2. On the Mesh Management page, click the name of the ASM instance. In the left-side navigation pane, choose Traffic Management Center > DestinationRule. On the page that appears, click Create from YAML.

  3. Fill in the following sample code and click Create. The following sample YAML code specifies the PEAK_EWMA load balancing algorithm for the simple-server service in the default namespace.

    apiVersion: networking.istio.io/v1beta1
    kind: DestinationRule
    metadata:
      name: simple-server
      namespace: default
    spec:
      host: simple-server.default.svc.cluster.local
      trafficPolicy:
        loadBalancer:
          simple: PEAK_EWMA # Uses the PEAK_EWMA load balancing algorithm of ASM.

Example

Description

In this example, the simple-server application is the server. The sleep application functions as the client to send test traffic and the simple-server.default.svc.cluster.local service based on the simple-server application functions as the server. This service has two deployments with different configurations:

  • simple-server-normal: The response latency of this deployment ranges from 50 ms to 100 ms.

  • simple-server-hight-latency: The response latency of this deployment ranges from 500 ms to 2000 ms. This deployment is used to simulate the increased latency of some workloads of a service.

image

Step 1: Enable metric monitoring for the ASM instance

To show the benefits of the peak EWMA load balancing algorithm in a visual manner, we enable metric monitoring for the ASM instance in this example to observe the changes of the overall response latency of the service before and after the peak EWMA load balancing algorithm is enabled. For more information about how to enable metric monitoring and collect metrics to Application Real-Time Monitoring Service (ARMS), see Collect metrics to Managed Service for Prometheus.

Step 2: Deploy the required environment

  1. Use kubectl to connect to the ACK cluster based on the information in the kubeconfig file, and create a sleep.yaml file that contains the following content:

    Show the YAML file

    apiVersion: v1
    kind: ServiceAccount
    metadata:
      name: sleep
    ---
    apiVersion: v1
    kind: Service
    metadata:
      name: sleep
      labels:
        app: sleep
        service: sleep
    spec:
      ports:
      - port: 80
        name: http
      selector:
        app: sleep
    ---
    apiVersion: apps/v1
    kind: Deployment
    metadata:
      name: sleep
    spec:
      replicas: 1
      selector:
        matchLabels:
          app: sleep
      template:
        metadata:
          labels:
            app: sleep
        spec:
          terminationGracePeriodSeconds: 0
          serviceAccountName: sleep
          containers:
          - name: sleep
            image: curlimages/curl
            command: ["/bin/sleep", "infinity"]
            imagePullPolicy: IfNotPresent
            volumeMounts:
            - mountPath: /etc/sleep/tls
              name: secret-volume
          volumes:
          - name: secret-volume
            secret:
              secretName: sleep-secret
              optional: true
    ---

    Run the following command to deploy the sleep application:

    kubectl apply -f sleep.yaml
  2. Create a simple.yaml file that contains the following content:

    Show the YAML file

    apiVersion: v1
    kind: Service
    metadata:
      name: simple-server
      labels:
        app: simple-server
        service: simple-server
    spec:
      ports:
      - port: 8080
        name: http
      selector:
        app: simple-server
    ---
    apiVersion: apps/v1
    kind: Deployment
    metadata:
      labels:
        app: simple-server
      name: simple-server-normal
      namespace: default
    spec:
      progressDeadlineSeconds: 600
      replicas: 1
      revisionHistoryLimit: 10
      selector:
        matchLabels:
          app: simple-server
      strategy:
        rollingUpdate:
          maxSurge: 25%
          maxUnavailable: 25%
        type: RollingUpdate
      template:
        metadata:
          creationTimestamp: null
          labels:
            app: simple-server
        spec:
          containers:
          - args:
            - --delayMin
            - "50"
            - --delayMax
            - "100"
            image: registry-cn-hangzhou.ack.aliyuncs.com/test-public/simple-server:v1.0.0.0-g88293ca-aliyun
            imagePullPolicy: IfNotPresent
            name: simple-server
            ports:
            - containerPort: 80
              protocol: TCP
            resources:
              limits:
                cpu: 500m
            terminationMessagePath: /dev/termination-log
            terminationMessagePolicy: File
          dnsPolicy: ClusterFirst
          restartPolicy: Always
          schedulerName: default-scheduler
          securityContext: {}
          terminationGracePeriodSeconds: 30
    ---
    apiVersion: apps/v1
    kind: Deployment
    metadata:
      labels:
        app: simple-server
      name: simple-server-high-latency
      namespace: default
    spec:
      progressDeadlineSeconds: 600
      replicas: 1
      revisionHistoryLimit: 10
      selector:
        matchLabels:
          app: simple-server
      strategy:
        rollingUpdate:
          maxSurge: 25%
          maxUnavailable: 25%
        type: RollingUpdate
      template:
        metadata:
          creationTimestamp: null
          labels:
            app: simple-server
        spec:
          containers:
          - args:
            - --delayMin
            - "500"
            - --delayMax
            - "2000"
            image: registry-cn-hangzhou.ack.aliyuncs.com/test-public/simple-server:v1.0.0.0-g88293ca-aliyun
            imagePullPolicy: IfNotPresent
            name: simple-server
            ports:
            - containerPort: 80
              protocol: TCP
            resources:
              limits:
                cpu: 500m
            terminationMessagePath: /dev/termination-log
            terminationMessagePolicy: File
          dnsPolicy: ClusterFirst
          restartPolicy: Always
          schedulerName: default-scheduler
          securityContext: {}
          terminationGracePeriodSeconds: 30
    ---

    Run the following command to deploy the simple-server-normal and simple-server-high-latency applications:

    kubectl apply -f simple.yaml

Step 3: Initiate a test with the default load balancing algorithm

The default load balancing algorithm LEAST_REQUEST is used in the test to generate baseline data.

  1. Run the following command to initiate the test. 100 requests are sent to visit the /hello path of the simple-server service:

    kubectl exec -it deploy/sleep -c sleep --  sh -c 'for i in $(seq 1 100); do time curl simple-server:8080/hello; echo "request $i done"; done'

    Expected output:

    hello
     this is port: 8080real 0m 0.06s
    user    0m 0.00s
    sys     0m 0.00s
    request 1 done
    hello
     this is port: 8080real 0m 0.09s
    user    0m 0.00s
    sys     0m 0.00s
    request 2 done
    
    ......
    
    hello
     this is port: 8080real 0m 1.72s
    user    0m 0.00s
    sys     0m 0.00s
    request 100 done
  2. After the command is run, click the name of the desired ASM instance on the Mesh Management page. In the left-side navigation pane, choose Observability Management Center > Monitoring metrics. Click the Cloud ASM Istio Service tab and configure the following filter conditions:

    • Namespace: default

    • Service: simple-server.default.svc.cluster.local

    • Reporter: destination

    • Client Workload Namespace: default

    • Client Workload: sleep

    • Service Workload Namespace: default

    • Service Workload: simple-server-normal + simple-server-high-latency

  3. Click Client Workloads to view the Incoming Request Duration By Source section.

    image

    It shows that the P50 response latency of requests from the sleep application to the simple-server service is 87.5 ms, and the P95 response latency rises significantly to 2.05 s. This is because the response latency of the simple-server-high-latency application is higher, which increases the overall response time of the simple-server service.

    Important

    The preceding test results are theoretical values obtained in a controlled experimental environment. Actual results may vary depending on your business environment.

Step 4: Configure the peak EWMA load balancing algorithm and perform a test again

Create a destination rule to configure the peak EWMA load balancing algorithm for the simple-server service.

  1. Use the following YAML content to create a destination rule by referring to the steps in Use peak EWMA:

    apiVersion: networking.istio.io/v1beta1
    kind: DestinationRule
    metadata:
      name: simple-server
      namespace: default
    spec:
      host: simple-server.default.svc.cluster.local
      trafficPolicy:
        loadBalancer:
          simple: PEAK_EWMA
  2. Use kubectl to connect to the ACK cluster based on the information in the kubeconfig file, and run the following command to initiate a test again:

    kubectl exec -it deploy/sleep -c sleep --  sh -c 'for i in $(seq 1 100); do time curl simple-server:8080/hello; echo "request $i done"; done'
  3. Observe the test results by referring to Step 3. The results show that the P90, P95, and P99 latencies significantly reduce. This is because the peak EWMA load balancing algorithm finds that the latency of the simple-server-high-latency workload is high and reduces its load balancing weight. As a result, more requests are routed to simple-server-normal with lower latency. From the perspective of the simple-server service, the overall latency of requests is significantly reduced.

    image