使用LoadRampingPolicy实现服务渐进式上线 - 服务网格 ASM

ASM流量调度套件支持服务渐进式上线策略。当新服务发布时，通过同步配置渐进式上线策略可以使得服务接收到的流量逐渐增加，确保服务平稳上线。本文介绍如何使用流量调度套件提供的LoadRampingPolicy来实现服务渐进式上线。

背景信息

服务渐进式上线策略LoadRampingPolicy通过逐渐增加服务接受到的请求来完成服务的渐进式上线。相关组件和大致工作流程如下：

请求采样器：LoadRampingPolicy使用一个请求采样器拒绝一定比例的请求。在服务上线初期请求采样器将以一个较大的比例拒绝发往服务的请求。
负载衡量器：LoadRampingPolicy通过负载衡量器来确定服务的负载状态。当服务的负载状态在给定的阈值范围内时，请求采样器将根据指定好的步骤逐步减少拒绝的请求比例、直到几乎接受所有请求，使得服务接受到的请求负载逐渐增加。

当对集群中的新服务进行上线时，通过服务渐进式上线策略可以逐渐增加服务接受到的流量，避免刚刚部署的服务因突发的流量而发生问题，同时通过实时检测服务负载状态来逐步提升服务接受到的流量比例，完成服务平稳上线。

前提条件

已添加Kubernetes托管版集群到ASM实例，且ASM实例为1.21.X.XX及以上。具体操作，请参见添加集群到ASM实例。
已为Kubernetes集群中的default命名空间开启自动注入。具体操作，请参见管理全局命名空间。
已通过kubectl连接至ACK集群。具体操作，请参见获取集群KubeConfig并通过kubectl工具连接集群。
已开启ASM流量调度套件。具体操作，请参见开启ASM流量调度套件。
已经部署httpbin应用，并且可以通过网关访问。具体操作，请参见部署httpbin应用。

步骤一：创建LoadRampingPolicy

使用kubectl连接到ASM实例，具体操作，请参见通过控制面kubectl访问Istio资源。

使用以下内容，创建LoadRampingPolicy.yaml文件。

apiVersion: istio.alibabacloud.com/v1
kind: LoadRampingPolicy
metadata:
  name: load-ramping
  namespace: istio-system
spec:
  drivers:
    average_latency_drivers:
      - selectors:
          - service: httpbin.default.svc.cluster.local
        criteria:
          forward:
            threshold: 100
          reset:
            threshold: 200
  start: true
  load_ramp:
    sampler:
      selectors:
        - service: httpbin.default.svc.cluster.local
    steps:
      - duration: 0s
        target_accept_percentage: 1
      - duration: 300s
        target_accept_percentage: 100.0

部分配置项说明如下。关于配置项的更多信息，请参见LoadRampingPolicy CRD说明。

配置项	说明
steps	上线阶段定义。示例中给出两个上线阶段，要求300秒内请求采样器接收的请求比例接近100%。
selectors	指定应用渐进式上线策略的多个服务。示例中使用httpbin.default.svc.cluster.local 服务进行渐进式上线。
criteria	指定服务负载衡量基准。示例中给出当服务平均延迟小于100ms时推进上线步骤，而若服务平均延迟大于200ms，则重置上线步骤、重新让请求采样器从最大的请求拒绝比例开始拒绝请求。

执行以下指令配置服务渐进式上线策略。
```
kubectl apply -f LoadRampingPolicy.yaml
```

步骤二：测试服务渐进式上线效果

本步骤使用压测工具fortio进行测试，安装方式请参见安装fortio。

运行下面的压测命令，对httpbin服务开始压测。

fortio load -c 10 -qps 0 -t 300s -allow-initial-errors -a http://${ASM网关IP}/status/200

说明

请将上述指令中的${ASM网关IP}替换为ASM网关的IP地址。有关获取ASM网关IP地址的具体操作，请参见使用Istio资源实现版本流量路由。

预期输出：

...
# target 50% 0.0613214
# target 75% 0.0685102
# target 90% 0.0756739
# target 99% 0.0870132
# target 99.9% 0.115361
Sockets used: 31529 (for perfect keepalive, would be 10)
Uniform: false, Jitter: false
Code 200 : 26718 (45.9 %)
Code 403 : 31510 (54.1 %)
Response Header Sizes : count 58228 avg 111.04245 +/- 120.6 min 0 max 243 sum 6465780
Response Body/Total Sizes : count 58228 avg 185.18012 +/- 52.32 min 137 max 243 sum 10782668
All done 58228 calls (plus 10 warmup) 51.524 ms avg, 194.1 qps

可以看到，请求平均延迟为51ms，在本示例配置的渐进式发布允许范围之内；响应403状态码（被拒绝）的请求比例约为一半。在测试开始的300秒内，接受的请求数量从1%开始逐渐增加，在测试结束时服务接收流量的比例为100%。

运行下面的压测命令，对httpbin服务再次压测。

fortio load -c 10 -qps 0 -t 300s -allow-initial-errors -a http://${ASM网关IP}/status/200

预期输出：

...
# target 50% 0.0337055
# target 75% 0.0368905
# target 90% 0.0396488
# target 99% 0.0791
# target 99.9% 0.123187
Sockets used: 455 (for perfect keepalive, would be 10)
Uniform: false, Jitter: false
Code 200 : 82959 (99.5 %)
Code 403 : 445 (0.5 %)
Response Header Sizes : count 83404 avg 240.71018 +/- 17.63 min 0 max 243 sum 20076192
Response Body/Total Sizes : count 83404 avg 241.44115 +/- 7.649 min 137 max 243 sum 20137158
All done 83404 calls (plus 10 warmup) 35.970 ms avg, 278.0 qps

从预期输出可以看到，被拒绝的请求数量仅为0.5%，服务接收流量的比例已经达到100%，渐进式上线过程已经完成。

删除渐进式上线策略。
1. 使用kubectl连接到ASM实例，具体操作，请参见通过控制面kubectl访问Istio资源。
2. 执行以下指令，删除渐进式上线策略，服务上线完成。
```
kubectl delete loadrampingpolicy load-ramping -n istio-system
```
重要
示例中LoadRampingPolicy设定了300s的时间来模拟服务的渐进式上线，由于存在criteria.reset.threshold的配置，在验证完成后需要手动删除策略，以免因服务延迟波动而再次触发渐进式上线，影响服务的正常运行。

相关操作

您可以通过Grafana大盘来观测LoadRampingPolicy策略的执行效果。请确保Grafana使用的数据源Prometheus实例已经完成配置采集ASM流量调度套件相关指标。

将以下内容导入到Grafana，创建LoadRampingPolicy策略的大盘。

展开查看JSON内容

{
  "annotations": {
    "list": [
      {
        "builtIn": 1,
        "datasource": {
          "type": "grafana",
          "uid": "-- Grafana --"
        },
        "enable": true,
        "hide": true,
        "iconColor": "rgba(0, 211, 255, 1)",
        "name": "Annotations & Alerts",
        "type": "dashboard"
      }
    ]
  },
  "editable": true,
  "fiscalYearStartMonth": 0,
  "graphTooltip": 0,
  "id": 43,
  "links": [],
  "liveNow": false,
  "panels": [
    {
      "datasource": {
        "type": "prometheus",
        "uid": "${datasource}"
      },
      "description": "",
      "fieldConfig": {
        "defaults": {
          "color": {
            "mode": "palette-classic"
          },
          "custom": {
            "axisCenteredZero": false,
            "axisColorMode": "text",
            "axisLabel": "",
            "axisPlacement": "auto",
            "barAlignment": 0,
            "drawStyle": "line",
            "fillOpacity": 10,
            "gradientMode": "none",
            "hideFrom": {
              "legend": false,
              "tooltip": false,
              "viz": false
            },
            "lineInterpolation": "linear",
            "lineWidth": 1,
            "pointSize": 5,
            "scaleDistribution": {
              "type": "linear"
            },
            "showPoints": "auto",
            "spanNulls": false,
            "stacking": {
              "group": "A",
              "mode": "none"
            },
            "thresholdsStyle": {
              "mode": "off"
            }
          },
          "mappings": [],
          "thresholds": {
            "mode": "absolute",
            "steps": [
              {
                "color": "green",
                "value": null
              },
              {
                "color": "red",
                "value": 80
              }
            ]
          },
          "unit": ""
        },
        "overrides": []
      },
      "gridPos": {
        "h": 10,
        "w": 24,
        "x": 0,
        "y": 0
      },
      "id": 1,
      "interval": "10s",
      "options": {
        "legend": {
          "calcs": [],
          "displayMode": "list",
          "placement": "bottom",
          "showLegend": true
        },
        "tooltip": {
          "mode": "single",
          "sort": "none"
        }
      },
      "pluginVersion": "v10.1.0",
      "targets": [
        {
          "datasource": {
            "type": "prometheus",
            "uid": "${datasource}"
          },
          "editorMode": "code",
          "expr": "(sum by (policy_name)(rate(sampler_counter_total{decision_type=\"DECISION_TYPE_ACCEPTED\"}[30s])) / sum by (policy_name)(rate(sampler_counter_total{}[30s]))) * 100",
          "intervalFactor": 1,
          "legendFormat": "policy_name={{policy_name}}",
          "range": true,
          "refId": "A"
        }
      ],
      "title": "ACCEPT PERCENTAGE",
      "type": "timeseries"
    },
    {
      "datasource": {
        "type": "prometheus",
        "uid": "${datasource}"
      },
      "description": "",
      "fieldConfig": {
        "defaults": {
          "color": {
            "mode": "palette-classic"
          },
          "custom": {
            "axisCenteredZero": false,
            "axisColorMode": "text",
            "axisLabel": "",
            "axisPlacement": "auto",
            "barAlignment": 0,
            "drawStyle": "line",
            "fillOpacity": 10,
            "gradientMode": "none",
            "hideFrom": {
              "legend": false,
              "tooltip": false,
              "viz": false
            },
            "lineInterpolation": "linear",
            "lineWidth": 1,
            "pointSize": 5,
            "scaleDistribution": {
              "type": "linear"
            },
            "showPoints": "auto",
            "spanNulls": false,
            "stacking": {
              "group": "A",
              "mode": "none"
            },
            "thresholdsStyle": {
              "mode": "off"
            }
          },
          "mappings": [],
          "thresholds": {
            "mode": "absolute",
            "steps": [
              {
                "color": "green",
                "value": null
              },
              {
                "color": "red",
                "value": 80
              }
            ]
          },
          "unit": ""
        },
        "overrides": []
      },
      "gridPos": {
        "h": 10,
        "w": 24,
        "x": 0,
        "y": 10
      },
      "id": 2,
      "interval": "10s",
      "options": {
        "legend": {
          "calcs": [],
          "displayMode": "list",
          "placement": "bottom",
          "showLegend": true
        },
        "tooltip": {
          "mode": "single",
          "sort": "none"
        }
      },
      "pluginVersion": "v10.1.0",
      "targets": [
        {
          "datasource": {
            "type": "prometheus",
            "uid": "${datasource}"
          },
          "editorMode": "code",
          "expr": "sum by (policy_name)(rate(sampler_counter_total{component_id=\"root.14.1\"}[$__rate_interval])) by (decision_type)",
          "intervalFactor": 1,
          "range": true,
          "refId": "A"
        }
      ],
      "title": "Throughput - Accept/Reject",
      "type": "timeseries"
    }
  ],
  "refresh": false,
  "schemaVersion": 38,
  "style": "dark",
  "tags": [],
  "templating": {
    "list": [
      {
        "hide": 0,
        "includeAll": false,
        "label": "Data Source",
        "multi": false,
        "name": "datasource",
        "options": [],
        "query": "prometheus",
        "refresh": 1,
        "regex": "",
        "skipUrlSync": false,
        "type": "datasource"
      }
    ]
  },
  "time": {
    "from": "now-5m",
    "to": "now"
  },
  "timepicker": {},
  "timezone": "browser",
  "title": "Policy Summary - load-ramping",
  "version": 3,
  "weekStart": ""
}

大盘效果如下。