Use LoadRampingPolicy to implement progressive service release - Alibaba Cloud Service Mesh

The Service Mesh (ASM) traffic scheduling suite supports progressive service release policies. When you release a new service, you can configure a progressive release policy to gradually increase the traffic received by the service. This ensures that the service is smoothly released. This topic describes how to use LoadRampingPolicy provided by the traffic management suite to implement progressive service release.

Background information

LoadRampingPolicy that defines progressive service release policies allows you to progressively increase requests received by a service to implement the progressive release of the service. LoadRampingPolicy uses the following components and works in the following way:

Request sampler: LoadRampingPolicy uses a request sampler to reject a certain percentage of requests. In the early stage of the service release, the request sampler will reject a large percentage of requests that are sent to the service.
Load meter: LoadRampingPolicy determines the service load by using a load meter. When the service load is within a given threshold range, the request sampler gradually reduces the percentage of rejected requests by performing specific steps until almost all requests are accepted. This way, requests that are sent to the service are progressively increased.

When you release a new service in a cluster, you can use LoadRampingPolicy to progressively increase traffic that is received by the service. This can prevent service errors due to traffic bursts. At the same time, LoadRampingPolicy checks the service load in real time to gradually improve the percentage of traffic that is received by the service. This facilitates the smooth release of the service.

Prerequisites

A Container Service for Kubernetes (ACK) managed cluster is added to your ASM instance, and the version of your ASM instance is V1.21.X.XX or later. For more information, see Add a cluster to an ASM instance.
Automatic sidecar proxy injection is enabled for the default namespace in the ACK cluster. For more information, see Manage global namespaces.
You have connected to the ACK cluster by using kubectl. For more information, see Obtain the kubeconfig file of a cluster and use kubectl to connect to the cluster.
The ASM traffic scheduling suite is enabled. For more information, see Enable the ASM traffic scheduling suite.
The HTTPBin application is deployed and can be accessed over a gateway. For more information, see Deploy the HTTPBin application.

Step 1: Create LoadRampingPolicy

Use kubectl to connect to the ASM instance. For more information, see Use kubectl on the control plane to access Istio resources.

Create a LoadRampingPolicy.yaml file that contains the following content:

apiVersion: istio.alibabacloud.com/v1
kind: LoadRampingPolicy
metadata:
  name: load-ramping
  namespace: istio-system
spec:
  drivers:
    average_latency_drivers:
      - selectors:
          - service: httpbin.default.svc.cluster.local
        criteria:
          forward:
            threshold: 100
          reset:
            threshold: 200
  start: true
  load_ramp:
    sampler:
      selectors:
        - service: httpbin.default.svc.cluster.local
    steps:
      - duration: 0s
        target_accept_percentage: 1
      - duration: 300s
        target_accept_percentage: 100.0

The following table describes some of the fields. For more information about the related fields, see Description of LoadRampingPolicy fields.

Field	Description
steps	The definition of the release phase. In this example, two release phases are defined. The definition requires that the percentage of requests received by the request sampler within 300 seconds is close to 100%.
selectors	The services to which the progressive release policy is applied. In this example, httpbin.default.svc.cluster.local is used. This indicates that progressive release is performed on the httpbin.default.svc.cluster.local service.
criteria	Specifies the service load measurement benchmark. In this example, the criteria field defines the following content: (1) When the average service latency is less than 100 ms, the service release proceeds. (2) When the average service latency is greater than 200 ms, the service release is reset and the request sampler rejects requests at the maximum rejection percentage.

Run the following command to configure the progressive service release policy:
```
kubectl apply -f LoadRampingPolicy.yaml
```

Step 2: Verify whether LoadRampingPolicy takes effect

In this example, the stress testing tool Fortio is used. For more information, see the Installation section of Fortio on the GitHub website.

Run the following command to perform stress testing on the HTTPBin application:
```
fortio load -c 10 -qps 0 -t 300s -allow-initial-errors -a http://${IP address of the ASM ingress gateway}/status/200
```
Note
Replace ${IP address of the ASM ingress gateway} in the preceding commands with the IP address of your ASM ingress gateway. For more information about how to obtain the IP address of the ASM ingress gateway, see substep 1 of Step 3 in the Use Istio resources to route traffic to different versions of a service topic.
Expected output:
```
...
# target 50% 0.0613214
# target 75% 0.0685102
# target 90% 0.0756739
# target 99% 0.0870132
# target 99.9% 0.115361
Sockets used: 31529 (for perfect keepalive, would be 10)
Uniform: false, Jitter: false
Code 200 : 26718 (45.9 %)
Code 403 : 31510 (54.1 %)
Response Header Sizes : count 58228 avg 111.04245 +/- 120.6 min 0 max 243 sum 6465780
Response Body/Total Sizes : count 58228 avg 185.18012 +/- 52.32 min 137 max 243 sum 10782668
All done 58228 calls (plus 10 warmup) 51.524 ms avg, 194.1 qps
```
The output shows that the average latency of requests is 51 ms, which is within the allowed range configured in this example. The 403 status code meaning access to the requested resource is forbidden is returned for about a half of requests. Within 300 seconds of the test, the percentage of requests that are received by the service is gradually increased from 1%. At the end of the test, the percentage of requests that are received by the service reaches 100%.

Run the following command to perform stress testing on the HTTPBin application again:

fortio load -c 10 -qps 0 -t 300s -allow-initial-errors -a http://${IP address of the ASM ingress gateway}/status/200

Expected output:

...
# target 50% 0.0337055
# target 75% 0.0368905
# target 90% 0.0396488
# target 99% 0.0791
# target 99.9% 0.123187
Sockets used: 455 (for perfect keepalive, would be 10)
Uniform: false, Jitter: false
Code 200 : 82959 (99.5 %)
Code 403 : 445 (0.5 %)
Response Header Sizes : count 83404 avg 240.71018 +/- 17.63 min 0 max 243 sum 20076192
Response Body/Total Sizes : count 83404 avg 241.44115 +/- 7.649 min 137 max 243 sum 20137158
All done 83404 calls (plus 10 warmup) 35.970 ms avg, 278.0 qps

The output shows that the percentage of rejected requests is only 0.5% and the percentage of requests received by the service reaches 99.5%. This indicates that the progressive service release has been finished.

Delete LoadRampingPolicy.
1. Use kubectl to connect to the ASM instance. For more information, see Use kubectl on the control plane to access Istio resources.
2. Run the following command to delete LoadRampingPolicy after the service is released:
```
kubectl delete loadrampingpolicy load-ramping -n istio-system
```
Important
In this example, LoadRampingPolicy sets a time of 300s to simulate the progressive release of the service. Because you set the criteria.reset.threshold section, after you verify the result of the progressive service release policy, you must manually delete LoadRampingPolicy to prevent the progressive release of the service from being triggered again due to the fluctuation of service latency and therefore ensure that the service works as expected.

References

You can verify whether LoadRampingPolicy takes effect on Grafana. You need to ensure that the Prometheus instance for Grafana has been configured with ASM traffic scheduling suite.

You can import the following content into Grafana to create a dashboard for LoadRampingPolicy.

Click to view details

{
  "annotations": {
    "list": [
      {
        "builtIn": 1,
        "datasource": {
          "type": "grafana",
          "uid": "-- Grafana --"
        },
        "enable": true,
        "hide": true,
        "iconColor": "rgba(0, 211, 255, 1)",
        "name": "Annotations & Alerts",
        "type": "dashboard"
      }
    ]
  },
  "editable": true,
  "fiscalYearStartMonth": 0,
  "graphTooltip": 0,
  "id": 43,
  "links": [],
  "liveNow": false,
  "panels": [
    {
      "datasource": {
        "type": "prometheus",
        "uid": "${datasource}"
      },
      "description": "",
      "fieldConfig": {
        "defaults": {
          "color": {
            "mode": "palette-classic"
          },
          "custom": {
            "axisCenteredZero": false,
            "axisColorMode": "text",
            "axisLabel": "",
            "axisPlacement": "auto",
            "barAlignment": 0,
            "drawStyle": "line",
            "fillOpacity": 10,
            "gradientMode": "none",
            "hideFrom": {
              "legend": false,
              "tooltip": false,
              "viz": false
            },
            "lineInterpolation": "linear",
            "lineWidth": 1,
            "pointSize": 5,
            "scaleDistribution": {
              "type": "linear"
            },
            "showPoints": "auto",
            "spanNulls": false,
            "stacking": {
              "group": "A",
              "mode": "none"
            },
            "thresholdsStyle": {
              "mode": "off"
            }
          },
          "mappings": [],
          "thresholds": {
            "mode": "absolute",
            "steps": [
              {
                "color": "green",
                "value": null
              },
              {
                "color": "red",
                "value": 80
              }
            ]
          },
          "unit": ""
        },
        "overrides": []
      },
      "gridPos": {
        "h": 10,
        "w": 24,
        "x": 0,
        "y": 0
      },
      "id": 1,
      "interval": "10s",
      "options": {
        "legend": {
          "calcs": [],
          "displayMode": "list",
          "placement": "bottom",
          "showLegend": true
        },
        "tooltip": {
          "mode": "single",
          "sort": "none"
        }
      },
      "pluginVersion": "v10.1.0",
      "targets": [
        {
          "datasource": {
            "type": "prometheus",
            "uid": "${datasource}"
          },
          "editorMode": "code",
          "expr": "(sum by (policy_name)(rate(sampler_counter_total{decision_type=\"DECISION_TYPE_ACCEPTED\"}[30s])) / sum by (policy_name)(rate(sampler_counter_total{}[30s]))) * 100",
          "intervalFactor": 1,
          "legendFormat": "policy_name={{policy_name}}",
          "range": true,
          "refId": "A"
        }
      ],
      "title": "ACCEPT PERCENTAGE",
      "type": "timeseries"
    },
    {
      "datasource": {
        "type": "prometheus",
        "uid": "${datasource}"
      },
      "description": "",
      "fieldConfig": {
        "defaults": {
          "color": {
            "mode": "palette-classic"
          },
          "custom": {
            "axisCenteredZero": false,
            "axisColorMode": "text",
            "axisLabel": "",
            "axisPlacement": "auto",
            "barAlignment": 0,
            "drawStyle": "line",
            "fillOpacity": 10,
            "gradientMode": "none",
            "hideFrom": {
              "legend": false,
              "tooltip": false,
              "viz": false
            },
            "lineInterpolation": "linear",
            "lineWidth": 1,
            "pointSize": 5,
            "scaleDistribution": {
              "type": "linear"
            },
            "showPoints": "auto",
            "spanNulls": false,
            "stacking": {
              "group": "A",
              "mode": "none"
            },
            "thresholdsStyle": {
              "mode": "off"
            }
          },
          "mappings": [],
          "thresholds": {
            "mode": "absolute",
            "steps": [
              {
                "color": "green",
                "value": null
              },
              {
                "color": "red",
                "value": 80
              }
            ]
          },
          "unit": ""
        },
        "overrides": []
      },
      "gridPos": {
        "h": 10,
        "w": 24,
        "x": 0,
        "y": 10
      },
      "id": 2,
      "interval": "10s",
      "options": {
        "legend": {
          "calcs": [],
          "displayMode": "list",
          "placement": "bottom",
          "showLegend": true
        },
        "tooltip": {
          "mode": "single",
          "sort": "none"
        }
      },
      "pluginVersion": "v10.1.0",
      "targets": [
        {
          "datasource": {
            "type": "prometheus",
            "uid": "${datasource}"
          },
          "editorMode": "code",
          "expr": "sum by (policy_name)(rate(sampler_counter_total{component_id=\"root.14.1\"}[$__rate_interval])) by (decision_type)",
          "intervalFactor": 1,
          "range": true,
          "refId": "A"
        }
      ],
      "title": "Throughput - Accept/Reject",
      "type": "timeseries"
    }
  ],
  "refresh": false,
  "schemaVersion": 38,
  "style": "dark",
  "tags": [],
  "templating": {
    "list": [
      {
        "hide": 0,
        "includeAll": false,
        "label": "Data Source",
        "multi": false,
        "name": "datasource",
        "options": [],
        "query": "prometheus",
        "refresh": 1,
        "regex": "",
        "skipUrlSync": false,
        "type": "datasource"
      }
    ]
  },
  "time": {
    "from": "now-5m",
    "to": "now"
  },
  "timepicker": {},
  "timezone": "browser",
  "title": "Policy Summary - load-ramping",
  "version": 3,
  "weekStart": ""
}

The dashboard is as follows.