使用LoadRampingPolicy實現服務漸進式上線 - Alibaba Cloud Service Mesh

ASM流量調度套件支援服務漸進式上線策略。當新服務發布時，通過同步配置漸進式上線策略可以使得服務接收到的流量逐漸增加，確保服務平穩上線。本文介紹如何使用流量調度套件提供的LoadRampingPolicy來實現服務漸進式上線。

背景資訊

服務漸進式上線策略LoadRampingPolicy通過逐漸增加服務接受到的請求來完成服務的漸進式上線。相關組件和大致工作流程如下：

請求採樣器：LoadRampingPolicy使用一個請求採樣器拒絕一定比例的請求。在服務上線初期請求採樣器將以一個較大的比例拒絕發往服務的請求。
負載衡量器：LoadRampingPolicy通過負載衡量器來確定服務的負載狀態。當服務的負載狀態在給定的閾值範圍內時，請求採樣器將根據指定好的步驟逐步減少拒絕的請求比例、直到幾乎接受所有請求，使得服務接受到的請求負載逐漸增加。

當對叢集中的新服務進行上線時，通過服務漸進式上線策略可以逐漸增加服務接受到的流量，避免剛剛部署的服務因突發的流量而發生問題，同時通過即時檢測服務負載狀態來逐步提升服務接受到的流量比例，完成服務平穩上線。

前提條件

已添加Kubernetes託管版叢集到ASM執行個體，且ASM執行個體為1.21.X.XX及以上。具體操作，請參見添加叢集到ASM執行個體。
已為Kubernetes叢集中的default命名空間開啟自動注入。具體操作，請參見管理全域命名空間。
已通過kubectl串連至ACK叢集。具體操作，請參見擷取叢集KubeConfig並通過kubectl工具串連叢集。
已開啟ASM流量調度套件。具體操作，請參見開啟ASM流量調度套件。
已經部署httpbin應用，並且可以通過網關訪問。具體操作，請參見部署httpbin應用。

步驟一：建立LoadRampingPolicy

使用kubectl串連到ASM執行個體，具體操作，請參見通過控制面kubectl訪問Istio資源。

使用以下內容，建立LoadRampingPolicy.yaml檔案。

apiVersion: istio.alibabacloud.com/v1
kind: LoadRampingPolicy
metadata:
  name: load-ramping
  namespace: istio-system
spec:
  drivers:
    average_latency_drivers:
      - selectors:
          - service: httpbin.default.svc.cluster.local
        criteria:
          forward:
            threshold: 100
          reset:
            threshold: 200
  start: true
  load_ramp:
    sampler:
      selectors:
        - service: httpbin.default.svc.cluster.local
    steps:
      - duration: 0s
        target_accept_percentage: 1
      - duration: 300s
        target_accept_percentage: 100.0

部分配置項說明如下。關於配置項的更多資訊，請參見LoadRampingPolicy CRD說明。

配置項	說明
steps	上線階段定義。樣本中給出兩個上線階段，要求300秒內請求採樣器接收的請求比例接近100%。
selectors	指定應用漸進式上線策略的多個服務。樣本中使用httpbin.default.svc.cluster.local 服務進行漸進式上線。
criteria	指定服務負載衡量基準。樣本中給出當服務平均延遲小於100ms時推進上線步驟，而若服務平均延遲大於200ms，則重設上線步驟、重新讓請求採樣器從最大的請求拒絕比例開始拒絕請求。

執行以下指令佈建服務漸進式上線策略。
```
kubectl apply -f LoadRampingPolicy.yaml
```

步驟二：測試服務漸進式上線效果

本步驟使用壓測工具fortio進行測試，安裝方式請參見安裝fortio。

運行下面的壓測命令，對httpbin服務開始壓測。

fortio load -c 10 -qps 0 -t 300s -allow-initial-errors -a http://${ASM網關IP}/status/200

說明

請將上述指令中的${ASM網關IP}替換為ASM網關的IP地址。有關擷取ASM網關IP地址的具體操作，請參見使用Istio資源實現版本流量路由。

預期輸出：

...
# target 50% 0.0613214
# target 75% 0.0685102
# target 90% 0.0756739
# target 99% 0.0870132
# target 99.9% 0.115361
Sockets used: 31529 (for perfect keepalive, would be 10)
Uniform: false, Jitter: false
Code 200 : 26718 (45.9 %)
Code 403 : 31510 (54.1 %)
Response Header Sizes : count 58228 avg 111.04245 +/- 120.6 min 0 max 243 sum 6465780
Response Body/Total Sizes : count 58228 avg 185.18012 +/- 52.32 min 137 max 243 sum 10782668
All done 58228 calls (plus 10 warmup) 51.524 ms avg, 194.1 qps

可以看到，請求平均延遲為51ms，在本樣本配置的漸進式發布允許範圍之內；響應403狀態代碼（被拒絕）的請求比例約為一半。在測試開始的300秒內，接受的請求數量從1%開始逐漸增加，在測試結束時服務接收流量的比例為100%。

運行下面的壓測命令，對httpbin服務再次壓測。

fortio load -c 10 -qps 0 -t 300s -allow-initial-errors -a http://${ASM網關IP}/status/200

預期輸出：

...
# target 50% 0.0337055
# target 75% 0.0368905
# target 90% 0.0396488
# target 99% 0.0791
# target 99.9% 0.123187
Sockets used: 455 (for perfect keepalive, would be 10)
Uniform: false, Jitter: false
Code 200 : 82959 (99.5 %)
Code 403 : 445 (0.5 %)
Response Header Sizes : count 83404 avg 240.71018 +/- 17.63 min 0 max 243 sum 20076192
Response Body/Total Sizes : count 83404 avg 241.44115 +/- 7.649 min 137 max 243 sum 20137158
All done 83404 calls (plus 10 warmup) 35.970 ms avg, 278.0 qps

從預期輸出可以看到，被拒絕的請求數量僅為0.5%，服務接收流量的比例已經達到100%，漸進式上線過程已經完成。

刪除漸進式上線策略。
1. 使用kubectl串連到ASM執行個體，具體操作，請參見通過控制面kubectl訪問Istio資源。
2. 執行以下指令，刪除漸進式上線策略，服務上線完成。
```
kubectl delete loadrampingpolicy load-ramping -n istio-system
```
重要
樣本中LoadRampingPolicy設定了300s的時間來類比服務的漸進式上線，由於存在criteria.reset.threshold的配置，在驗證完成後需要手動刪除策略，以免因服務延遲波動而再次觸發漸進式上線，影響服務的正常運行。

相關操作

您可以通過Grafana大盤來觀測LoadRampingPolicy策略的執行效果。請確保Grafana使用的資料來源Prometheus執行個體已經完成配置採集ASM流量調度套件相關指標。

將以下內容匯入到Grafana，建立LoadRampingPolicy策略的大盤。

展開查看JSON內容

{
  "annotations": {
    "list": [
      {
        "builtIn": 1,
        "datasource": {
          "type": "grafana",
          "uid": "-- Grafana --"
        },
        "enable": true,
        "hide": true,
        "iconColor": "rgba(0, 211, 255, 1)",
        "name": "Annotations & Alerts",
        "type": "dashboard"
      }
    ]
  },
  "editable": true,
  "fiscalYearStartMonth": 0,
  "graphTooltip": 0,
  "id": 43,
  "links": [],
  "liveNow": false,
  "panels": [
    {
      "datasource": {
        "type": "prometheus",
        "uid": "${datasource}"
      },
      "description": "",
      "fieldConfig": {
        "defaults": {
          "color": {
            "mode": "palette-classic"
          },
          "custom": {
            "axisCenteredZero": false,
            "axisColorMode": "text",
            "axisLabel": "",
            "axisPlacement": "auto",
            "barAlignment": 0,
            "drawStyle": "line",
            "fillOpacity": 10,
            "gradientMode": "none",
            "hideFrom": {
              "legend": false,
              "tooltip": false,
              "viz": false
            },
            "lineInterpolation": "linear",
            "lineWidth": 1,
            "pointSize": 5,
            "scaleDistribution": {
              "type": "linear"
            },
            "showPoints": "auto",
            "spanNulls": false,
            "stacking": {
              "group": "A",
              "mode": "none"
            },
            "thresholdsStyle": {
              "mode": "off"
            }
          },
          "mappings": [],
          "thresholds": {
            "mode": "absolute",
            "steps": [
              {
                "color": "green",
                "value": null
              },
              {
                "color": "red",
                "value": 80
              }
            ]
          },
          "unit": ""
        },
        "overrides": []
      },
      "gridPos": {
        "h": 10,
        "w": 24,
        "x": 0,
        "y": 0
      },
      "id": 1,
      "interval": "10s",
      "options": {
        "legend": {
          "calcs": [],
          "displayMode": "list",
          "placement": "bottom",
          "showLegend": true
        },
        "tooltip": {
          "mode": "single",
          "sort": "none"
        }
      },
      "pluginVersion": "v10.1.0",
      "targets": [
        {
          "datasource": {
            "type": "prometheus",
            "uid": "${datasource}"
          },
          "editorMode": "code",
          "expr": "(sum by (policy_name)(rate(sampler_counter_total{decision_type=\"DECISION_TYPE_ACCEPTED\"}[30s])) / sum by (policy_name)(rate(sampler_counter_total{}[30s]))) * 100",
          "intervalFactor": 1,
          "legendFormat": "policy_name={{policy_name}}",
          "range": true,
          "refId": "A"
        }
      ],
      "title": "ACCEPT PERCENTAGE",
      "type": "timeseries"
    },
    {
      "datasource": {
        "type": "prometheus",
        "uid": "${datasource}"
      },
      "description": "",
      "fieldConfig": {
        "defaults": {
          "color": {
            "mode": "palette-classic"
          },
          "custom": {
            "axisCenteredZero": false,
            "axisColorMode": "text",
            "axisLabel": "",
            "axisPlacement": "auto",
            "barAlignment": 0,
            "drawStyle": "line",
            "fillOpacity": 10,
            "gradientMode": "none",
            "hideFrom": {
              "legend": false,
              "tooltip": false,
              "viz": false
            },
            "lineInterpolation": "linear",
            "lineWidth": 1,
            "pointSize": 5,
            "scaleDistribution": {
              "type": "linear"
            },
            "showPoints": "auto",
            "spanNulls": false,
            "stacking": {
              "group": "A",
              "mode": "none"
            },
            "thresholdsStyle": {
              "mode": "off"
            }
          },
          "mappings": [],
          "thresholds": {
            "mode": "absolute",
            "steps": [
              {
                "color": "green",
                "value": null
              },
              {
                "color": "red",
                "value": 80
              }
            ]
          },
          "unit": ""
        },
        "overrides": []
      },
      "gridPos": {
        "h": 10,
        "w": 24,
        "x": 0,
        "y": 10
      },
      "id": 2,
      "interval": "10s",
      "options": {
        "legend": {
          "calcs": [],
          "displayMode": "list",
          "placement": "bottom",
          "showLegend": true
        },
        "tooltip": {
          "mode": "single",
          "sort": "none"
        }
      },
      "pluginVersion": "v10.1.0",
      "targets": [
        {
          "datasource": {
            "type": "prometheus",
            "uid": "${datasource}"
          },
          "editorMode": "code",
          "expr": "sum by (policy_name)(rate(sampler_counter_total{component_id=\"root.14.1\"}[$__rate_interval])) by (decision_type)",
          "intervalFactor": 1,
          "range": true,
          "refId": "A"
        }
      ],
      "title": "Throughput - Accept/Reject",
      "type": "timeseries"
    }
  ],
  "refresh": false,
  "schemaVersion": 38,
  "style": "dark",
  "tags": [],
  "templating": {
    "list": [
      {
        "hide": 0,
        "includeAll": false,
        "label": "Data Source",
        "multi": false,
        "name": "datasource",
        "options": [],
        "query": "prometheus",
        "refresh": 1,
        "regex": "",
        "skipUrlSync": false,
        "type": "datasource"
      }
    ]
  },
  "time": {
    "from": "now-5m",
    "to": "now"
  },
  "timepicker": {},
  "timezone": "browser",
  "title": "Policy Summary - load-ramping",
  "version": 3,
  "weekStart": ""
}

大盤效果如下。