All Products
Search
Document Center

Alibaba Cloud Service Mesh:Use LoadRampingPolicy to implement progressive service release

Last Updated:Nov 22, 2024

The Service Mesh (ASM) traffic scheduling suite supports progressive service release policies. When you release a new service, you can configure a progressive release policy to gradually increase the traffic received by the service. This ensures that the service is smoothly released. This topic describes how to use LoadRampingPolicy provided by the traffic management suite to implement progressive service release.

Background information

LoadRampingPolicy that defines progressive service release policies allows you to progressively increase requests received by a service to implement the progressive release of the service. LoadRampingPolicy uses the following components and works in the following way:

  • Request sampler: LoadRampingPolicy uses a request sampler to reject a certain percentage of requests. In the early stage of the service release, the request sampler will reject a large percentage of requests that are sent to the service.

  • Load meter: LoadRampingPolicy determines the service load by using a load meter. When the service load is within a given threshold range, the request sampler gradually reduces the percentage of rejected requests by performing specific steps until almost all requests are accepted. This way, requests that are sent to the service are progressively increased.

When you release a new service in a cluster, you can use LoadRampingPolicy to progressively increase traffic that is received by the service. This can prevent service errors due to traffic bursts. At the same time, LoadRampingPolicy checks the service load in real time to gradually improve the percentage of traffic that is received by the service. This facilitates the smooth release of the service.

Prerequisites

Step 1: Create LoadRampingPolicy

  1. Use kubectl to connect to the ASM instance. For more information, see Use kubectl on the control plane to access Istio resources.

  2. Create a LoadRampingPolicy.yaml file that contains the following content:

    apiVersion: istio.alibabacloud.com/v1
    kind: LoadRampingPolicy
    metadata:
      name: load-ramping
      namespace: istio-system
    spec:
      drivers:
        average_latency_drivers:
          - selectors:
              - service: httpbin.default.svc.cluster.local
            criteria:
              forward:
                threshold: 100
              reset:
                threshold: 200
      start: true
      load_ramp:
        sampler:
          selectors:
            - service: httpbin.default.svc.cluster.local
        steps:
          - duration: 0s
            target_accept_percentage: 1
          - duration: 300s
            target_accept_percentage: 100.0

    The following table describes some of the fields. For more information about the related fields, see Description of LoadRampingPolicy fields.

    Field

    Description

    steps

    The definition of the release phase. In this example, two release phases are defined. The definition requires that the percentage of requests received by the request sampler within 300 seconds is close to 100%.

    selectors

    The services to which the progressive release policy is applied. In this example, httpbin.default.svc.cluster.local is used. This indicates that progressive release is performed on the httpbin.default.svc.cluster.local service.

    criteria

    Specifies the service load measurement benchmark. In this example, the criteria field defines the following content: (1) When the average service latency is less than 100 ms, the service release proceeds. (2) When the average service latency is greater than 200 ms, the service release is reset and the request sampler rejects requests at the maximum rejection percentage.

  3. Run the following command to configure the progressive service release policy:

    kubectl apply -f LoadRampingPolicy.yaml

Step 2: Verify whether LoadRampingPolicy takes effect

In this example, the stress testing tool Fortio is used. For more information, see the Installation section of Fortio on the GitHub website.

  1. Run the following command to perform stress testing on the HTTPBin application:

    fortio load -c 10 -qps 0 -t 300s -allow-initial-errors -a http://${IP address of the ASM ingress gateway}/status/200
    Note

    Replace ${IP address of the ASM ingress gateway} in the preceding commands with the IP address of your ASM ingress gateway. For more information about how to obtain the IP address of the ASM ingress gateway, see substep 1 of Step 3 in the Use Istio resources to route traffic to different versions of a service topic.

    Expected output:

    ...
    # target 50% 0.0613214
    # target 75% 0.0685102
    # target 90% 0.0756739
    # target 99% 0.0870132
    # target 99.9% 0.115361
    Sockets used: 31529 (for perfect keepalive, would be 10)
    Uniform: false, Jitter: false
    Code 200 : 26718 (45.9 %)
    Code 403 : 31510 (54.1 %)
    Response Header Sizes : count 58228 avg 111.04245 +/- 120.6 min 0 max 243 sum 6465780
    Response Body/Total Sizes : count 58228 avg 185.18012 +/- 52.32 min 137 max 243 sum 10782668
    All done 58228 calls (plus 10 warmup) 51.524 ms avg, 194.1 qps

    The output shows that the average latency of requests is 51 ms, which is within the allowed range configured in this example. The 403 status code meaning access to the requested resource is forbidden is returned for about a half of requests. Within 300 seconds of the test, the percentage of requests that are received by the service is gradually increased from 1%. At the end of the test, the percentage of requests that are received by the service reaches 100%.

  2. Run the following command to perform stress testing on the HTTPBin application again:

    fortio load -c 10 -qps 0 -t 300s -allow-initial-errors -a http://${IP address of the ASM ingress gateway}/status/200

    Expected output:

    ...
    # target 50% 0.0337055
    # target 75% 0.0368905
    # target 90% 0.0396488
    # target 99% 0.0791
    # target 99.9% 0.123187
    Sockets used: 455 (for perfect keepalive, would be 10)
    Uniform: false, Jitter: false
    Code 200 : 82959 (99.5 %)
    Code 403 : 445 (0.5 %)
    Response Header Sizes : count 83404 avg 240.71018 +/- 17.63 min 0 max 243 sum 20076192
    Response Body/Total Sizes : count 83404 avg 241.44115 +/- 7.649 min 137 max 243 sum 20137158
    All done 83404 calls (plus 10 warmup) 35.970 ms avg, 278.0 qps

    The output shows that the percentage of rejected requests is only 0.5% and the percentage of requests received by the service reaches 99.5%. This indicates that the progressive service release has been finished.

  3. Delete LoadRampingPolicy.

    1. Use kubectl to connect to the ASM instance. For more information, see Use kubectl on the control plane to access Istio resources.

    2. Run the following command to delete LoadRampingPolicy after the service is released:

    kubectl delete loadrampingpolicy load-ramping -n istio-system
    Important

    In this example, LoadRampingPolicy sets a time of 300s to simulate the progressive release of the service. Because you set the criteria.reset.threshold section, after you verify the result of the progressive service release policy, you must manually delete LoadRampingPolicy to prevent the progressive release of the service from being triggered again due to the fluctuation of service latency and therefore ensure that the service works as expected.

References

You can verify whether LoadRampingPolicy takes effect on Grafana. You need to ensure that the Prometheus instance for Grafana has been configured with ASM traffic scheduling suite.

You can import the following content into Grafana to create a dashboard for LoadRampingPolicy.

Click to view details

{
  "annotations": {
    "list": [
      {
        "builtIn": 1,
        "datasource": {
          "type": "grafana",
          "uid": "-- Grafana --"
        },
        "enable": true,
        "hide": true,
        "iconColor": "rgba(0, 211, 255, 1)",
        "name": "Annotations & Alerts",
        "type": "dashboard"
      }
    ]
  },
  "editable": true,
  "fiscalYearStartMonth": 0,
  "graphTooltip": 0,
  "id": 43,
  "links": [],
  "liveNow": false,
  "panels": [
    {
      "datasource": {
        "type": "prometheus",
        "uid": "${datasource}"
      },
      "description": "",
      "fieldConfig": {
        "defaults": {
          "color": {
            "mode": "palette-classic"
          },
          "custom": {
            "axisCenteredZero": false,
            "axisColorMode": "text",
            "axisLabel": "",
            "axisPlacement": "auto",
            "barAlignment": 0,
            "drawStyle": "line",
            "fillOpacity": 10,
            "gradientMode": "none",
            "hideFrom": {
              "legend": false,
              "tooltip": false,
              "viz": false
            },
            "lineInterpolation": "linear",
            "lineWidth": 1,
            "pointSize": 5,
            "scaleDistribution": {
              "type": "linear"
            },
            "showPoints": "auto",
            "spanNulls": false,
            "stacking": {
              "group": "A",
              "mode": "none"
            },
            "thresholdsStyle": {
              "mode": "off"
            }
          },
          "mappings": [],
          "thresholds": {
            "mode": "absolute",
            "steps": [
              {
                "color": "green",
                "value": null
              },
              {
                "color": "red",
                "value": 80
              }
            ]
          },
          "unit": ""
        },
        "overrides": []
      },
      "gridPos": {
        "h": 10,
        "w": 24,
        "x": 0,
        "y": 0
      },
      "id": 1,
      "interval": "10s",
      "options": {
        "legend": {
          "calcs": [],
          "displayMode": "list",
          "placement": "bottom",
          "showLegend": true
        },
        "tooltip": {
          "mode": "single",
          "sort": "none"
        }
      },
      "pluginVersion": "v10.1.0",
      "targets": [
        {
          "datasource": {
            "type": "prometheus",
            "uid": "${datasource}"
          },
          "editorMode": "code",
          "expr": "(sum by (policy_name)(rate(sampler_counter_total{decision_type=\"DECISION_TYPE_ACCEPTED\"}[30s])) / sum by (policy_name)(rate(sampler_counter_total{}[30s]))) * 100",
          "intervalFactor": 1,
          "legendFormat": "policy_name={{policy_name}}",
          "range": true,
          "refId": "A"
        }
      ],
      "title": "ACCEPT PERCENTAGE",
      "type": "timeseries"
    },
    {
      "datasource": {
        "type": "prometheus",
        "uid": "${datasource}"
      },
      "description": "",
      "fieldConfig": {
        "defaults": {
          "color": {
            "mode": "palette-classic"
          },
          "custom": {
            "axisCenteredZero": false,
            "axisColorMode": "text",
            "axisLabel": "",
            "axisPlacement": "auto",
            "barAlignment": 0,
            "drawStyle": "line",
            "fillOpacity": 10,
            "gradientMode": "none",
            "hideFrom": {
              "legend": false,
              "tooltip": false,
              "viz": false
            },
            "lineInterpolation": "linear",
            "lineWidth": 1,
            "pointSize": 5,
            "scaleDistribution": {
              "type": "linear"
            },
            "showPoints": "auto",
            "spanNulls": false,
            "stacking": {
              "group": "A",
              "mode": "none"
            },
            "thresholdsStyle": {
              "mode": "off"
            }
          },
          "mappings": [],
          "thresholds": {
            "mode": "absolute",
            "steps": [
              {
                "color": "green",
                "value": null
              },
              {
                "color": "red",
                "value": 80
              }
            ]
          },
          "unit": ""
        },
        "overrides": []
      },
      "gridPos": {
        "h": 10,
        "w": 24,
        "x": 0,
        "y": 10
      },
      "id": 2,
      "interval": "10s",
      "options": {
        "legend": {
          "calcs": [],
          "displayMode": "list",
          "placement": "bottom",
          "showLegend": true
        },
        "tooltip": {
          "mode": "single",
          "sort": "none"
        }
      },
      "pluginVersion": "v10.1.0",
      "targets": [
        {
          "datasource": {
            "type": "prometheus",
            "uid": "${datasource}"
          },
          "editorMode": "code",
          "expr": "sum by (policy_name)(rate(sampler_counter_total{component_id=\"root.14.1\"}[$__rate_interval])) by (decision_type)",
          "intervalFactor": 1,
          "range": true,
          "refId": "A"
        }
      ],
      "title": "Throughput - Accept/Reject",
      "type": "timeseries"
    }
  ],
  "refresh": false,
  "schemaVersion": 38,
  "style": "dark",
  "tags": [],
  "templating": {
    "list": [
      {
        "hide": 0,
        "includeAll": false,
        "label": "Data Source",
        "multi": false,
        "name": "datasource",
        "options": [],
        "query": "prometheus",
        "refresh": 1,
        "regex": "",
        "skipUrlSync": false,
        "type": "datasource"
      }
    ]
  },
  "time": {
    "from": "now-5m",
    "to": "now"
  },
  "timepicker": {},
  "timezone": "browser",
  "title": "Policy Summary - load-ramping",
  "version": 3,
  "weekStart": ""
}

The dashboard is as follows.

image