當應用介面的請求訪問量飆升時,您可以通過Java應用介面的QPS配置HPA彈性策略,實現應用的彈性擴縮。本文介紹如何通過ARMS APM應用監控服務實現應用的HPAAuto Scaling。
工作原理
將ACK叢集中的Java應用接入ARMS APM應用監控服務後,您可以通過ARMS APM擷取應用介面的訪問詳情。關於如何將Java應用接入ARMS APM應用監控服務,請參見應用監控。ARMS APM應用監控服務將ARMS APM資料轉換為阿里雲Prometheus資料格式,alibaba-cloud-metrics-adapter組件將阿里雲Prometheus指標轉換成HPA可用的指標,最終實現應用的HPAAuto Scaling。
本文以部署應用arms-springboot-demo,並壓測其中介面/demo/queryUser/10為例進行介紹。
前提條件
已部署阿里雲Prometheus監控組件。具體操作,請參見步驟一:開啟阿里雲Prometheus監控。
已在命名空間kube-system中部署alibaba-cloud-metrics-adapter組件。具體操作,請參見部署alibaba-cloud-metrics-adapter組件。
已建立命名空間。具體操作,請參見管理命名空間與配額。本文建立的樣本命名空間為arms-demo。
已安裝JDK。關於ARMS APM應用監控支援的JDK版本,請參見ARMS應用監控支援的Java組件和架構。
操作流程
步驟一:安裝ARMS APM應用監控組件
為應用接入ARMS APM應用監控功能,需要在叢集中安裝ARMS APM應用監控組件one-pilot。
登入Container Service管理主控台,在左側導覽列選擇叢集。
在叢集列表頁面,單擊目的地組群名稱,然後在左側導覽列,選擇 。
在組件管理頁面,搜尋並定位ack-onepilot組件,在組件卡片地區單擊安裝,然後按照對話方塊提示配置參數,並單擊確認。
步驟二:授予ARMS資源的存取權限
如需監控ACK Serverless叢集或對接了ECI的叢集應用,請在雲資源訪問授權頁面完成授權,然後重啟ack-onepilot組件下的所有Pod。
如需監控ACK叢集應用,請先查看是否存在ARMS Addon Token。
如果ACK叢集存在ARMS Addon Token,此時ARMS會進行免密授權。
說明Kubernetes託管版叢集預設存在ARMS Addon Token。但對於部分早期建立的Kubernetes託管版叢集可能不存在ARMS Addon Token,請參考下文內容手動為叢集授予ARMS資源的存取權限。
如果ACK叢集中不存在ARMS Addon Token,請執行以下操作,手動為叢集授予ARMS資源的存取權限。
建立自訂權限原則,策略內容如下。具體操作,請參見步驟一:建立自訂權限原則。
{ "Action": "arms:*", "Resource": "*", "Effect": "Allow" }
為叢集的WorkerRole添加上一步建立的自訂許可權。具體操作,請參見步驟二:為叢集的Worker RAM角色授權。
步驟三:為Java應用開啟ARMS APM應用監控
在叢集中部署Java應用時,通過為應用打上Labels的方式開啟ARMS APM應用監控。
登入Container Service管理主控台,在左側導覽列選擇叢集。
在叢集列表頁面,單擊目的地組群名稱,然後在左側導覽列,選擇 。
在無狀態頁面右上方,單擊使用YAML建立資源。
選擇樣本模板,並在模板(YAML格式)中將以下
labels
添加到spec.template.metadata層級下。labels: armsPilotAutoEnable: "on" armsPilotCreateAppName: "<your-deployment-name>" # 請將<your-deployment-name>替換為您的應用程式名稱。 one-agent.jdk.version: "OpenJDK11" # 如果應用的JDK版本是JDK 11,則需要配置此參數。 armsSecAutoEnable: "on" # 如果需要接入應用安全,則需要配置此參數。
以下提供YAML樣本模板,展示如何建立一個無狀態(Deployment)應用並開啟ARMS APM應用監控。
查看部署ARMS APM應用效果。
在無狀態頁面,目標應用的操作列將出現ARMS控制台按鈕。
您可以單擊ARMS控制台跳轉查看監控資料。在左側導覽列,單擊介面調用,查看應用介面(如HTTP介面)的訪問詳情。此處提供的Demo應用arms-springboot-demo,已自動產生了平穩的介面調用。
手動建立關聯應用arms-springboot-demo的Service,並開啟負載平衡來訪問此應用的介面。
在叢集列表頁面,單擊目的地組群名稱,然後在左側導覽列,選擇 。
單擊頁面右上方建立,建立關聯應用的Service,然後單擊建立。關於配置項的說明,請參見建立服務。
稍等片刻,建立完成。在服務頁面記錄arms-demo-svc的外部連接埠,例如47.94.XX.XX:8080。
執行如下命令,通過外部連接埠訪問此服務的/demo/queryUser/10介面。
curl http://47.94.XX.XX:8080/demo/queryUser/10
預期輸出:
{"id":1,"name":"KeyOfSpectator","password":"12****"}
預期輸出表明,介面訪問正常。
步驟四:對接alibaba-cloud-metrics-adapter組件
請確保已部署阿里雲Prometheus監控組件,否則無法進行本操作。具體操作,請參見步驟一:開啟阿里雲Prometheus監控。
請確保已在命名空間kube-system中部署alibaba-cloud-metrics-adapter組件,否則無法進行本操作。具體操作,請參見部署alibaba-cloud-metrics-adapter組件。
登入
ARMS控制台。在左側導覽列選擇 ,進入可觀測監控 Prometheus 版的執行個體列表頁面。
在執行個體列表頁面,單擊目標執行個體名稱(格式為arms_metrics_{RegionId}_XXX),在左側導覽列單擊設定,然後在右側設定頁簽的最下方查看並記錄HTTP API地址(Grafana 讀取地址),即Prometheus URL。
在ack-alibaba-cloud-metrics-adapter中填入上一步中記錄的HTTP API地址(Grafana 讀取地址)(Prometheus URL)。
登入Container Service管理主控台,在左側導覽列選擇叢集。
在叢集列表頁面,單擊目的地組群名稱,然後在左側導覽列,選擇 。
在Helm頁面ack-alibaba-cloud-metrics-adapter所在行,單擊操作列的更新。
在更新發布面板插入步驟2中記錄的Prometheus URL。
修改ack-alibaba-cloud-metrics-adapter的adapter-config配置。
在Helm頁面,單擊ack-alibaba-cloud-metrics-adapter。
在基本資料頁簽,單擊adapter-config。
單擊頁面右上方YAML 編輯。
將如下內容添加至adapter-config中。
rules: - metricsQuery: sum by (rpc) (sum_over_time(<<.Series>>{rpc="/demo/queryUser/{id}",service="arms-demo:arms-k8s-demo",prpc="__all__",ppid="__all__",endpoint="__all__",destId="__all__",<<.LabelMatchers>>}[1m])) name: as: ${1}_per_second_queryuser matches: ^(.*)_count resources: namespaced: false seriesQuery: arms_app_requests_count
完整樣本如下:
執行如下命令,查看叢集中指標資料。
執行如下命令,查看指標arms_app_requests_per_second_queryuser是否存在。
kubectl get --raw "/apis/external.metrics.k8s.io/v1beta1"
預期輸出:
{"kind":"APIResourceList","apiVersion":"v1","groupVersion":"external.metrics.k8s.io/v1beta1","resources":[{"name":"k8s_workload_memory_working_set","singularName":"","namespaced":true,"kind":"ExternalMetricValueList","verbs":["get"]},{"name":"k8s_workload_memory_rss","singularName":"","namespaced":true,"kind":"ExternalMetricValueList","verbs":["get"]},{"name":"sls_ingress_latency_p9999","singularName":"","namespaced":true,"kind":"ExternalMetricValueList","verbs":["get"]},{"name":"slb_l7_qps","singularName":"","namespaced":true,"kind":"ExternalMetricValueList","verbs":["get"]},{"name":"cost_memory_usage","singularName":"","namespaced":true,"kind":"ExternalMetricValueList","verbs":["get"]},{"name":"slb_l4_traffic_rx","singularName":"","namespaced":true,"kind":"ExternalMetricValueList","verbs":["get"]},{"name":"sls_ingress_inflow","singularName":"","namespaced":true,"kind":"ExternalMetricValueList","verbs":["get"]},{"name":"slb_l7_upstream_rt","singularName":"","namespaced":true,"kind":"ExternalMetricValueList","verbs":["get"]},{"name":"cost_ratio","singularName":"","namespaced":true,"kind":"ExternalMetricValueList","verbs":["get"]},{"name":"slb_l4_traffic_tx","singularName":"","namespaced":true,"kind":"ExternalMetricValueList","verbs":["get"]},{"name":"slb_l4_packet_rx","singularName":"","namespaced":true,"kind":"ExternalMetricValueList","verbs":["get"]},{"name":"ahas_sentinel_pass_qps","singularName":"","namespaced":true,"kind":"ExternalMetricValueList","verbs":["get"]},{"name":"cost_memory_request","singularName":"","namespaced":true,"kind":"ExternalMetricValueList","verbs":["get"]},{"name":"sls_ingress_latency_avg","singularName":"","namespaced":true,"kind":"ExternalMetricValueList","verbs":["get"]},{"name":"slb_l4_max_connection","singularName":"","namespaced":true,"kind":"ExternalMetricValueList","verbs":["get"]},{"name":"cost_cpu_limit","singularName":"","namespaced":true,"kind":"ExternalMetricValueList","verbs":["get"]},{"name":"cost_day","singularName":"","namespaced":true,"kind":"ExternalMetricValueList","verbs":["get"]},{"name":"cost_month","singularName":"","namespaced":true,"kind":"ExternalMetricValueList","verbs":["get"]},{"name":"slb_l4_connection_utilization","singularName":"","namespaced":true,"kind":"ExternalMetricValueList","verbs":["get"]},{"name":"slb_l7_status_5xx","singularName":"","namespaced":true,"kind":"ExternalMetricValueList","verbs":["get"]},{"name":"k8s_workload_cpu_request","singularName":"","namespaced":true,"kind":"ExternalMetricValueList","verbs":["get"]},{"name":"k8s_workload_network_rx_rate","singularName":"","namespaced":true,"kind":"ExternalMetricValueList","verbs":["get"]},{"name":"k8s_workload_network_rx_errors","singularName":"","namespaced":true,"kind":"ExternalMetricValueList","verbs":["get"]},{"name":"cost_week","singularName":"","namespaced":true,"kind":"ExternalMetricValueList","verbs":["get"]},{"name":"k8s_workload_memory_cache","singularName":"","namespaced":true,"kind":"ExternalMetricValueList","verbs":["get"]},{"name":"cost_cpu_request","singularName":"","namespaced":true,"kind":"ExternalMetricValueList","verbs":["get"]},{"name":"cost_percorepricing","singularName":"","namespaced":true,"kind":"ExternalMetricValueList","verbs":["get"]},{"name":"sls_ingress_qps","singularName":"","namespaced":true,"kind":"ExternalMetricValueList","verbs":["get"]},{"name":"slb_l4_packet_tx","singularName":"","namespaced":true,"kind":"ExternalMetricValueList","verbs":["get"]},{"name":"slb_l7_status_4xx","singularName":"","namespaced":true,"kind":"ExternalMetricValueList","verbs":["get"]},{"name":"slb_l7_upstream_5xx","singularName":"","namespaced":true,"kind":"ExternalMetricValueList","verbs":["get"]},{"name":"k8s_workload_cpu_limit","singularName":"","namespaced":true,"kind":"ExternalMetricValueList","verbs":["get"]},{"name":"ahas_sentinel_block_qps","singularName":"","namespaced":true,"kind":"ExternalMetricValueList","verbs":["get"]},{"name":"cost_cpu_utilization","singularName":"","namespaced":true,"kind":"ExternalMetricValueList","verbs":["get"]},{"name":"sls_alb_ingress_qps","singularName":"","namespaced":true,"kind":"ExternalMetricValueList","verbs":["get"]},{"name":"cost_memory_utilization","singularName":"","namespaced":true,"kind":"ExternalMetricValueList","verbs":["get"]},{"name":"sls_ingress_latency_p95","singularName":"","namespaced":true,"kind":"ExternalMetricValueList","verbs":["get"]},{"name":"slb_l4_active_connection","singularName":"","namespaced":true,"kind":"ExternalMetricValueList","verbs":["get"]},{"name":"k8s_workload_memory_limit","singularName":"","namespaced":true,"kind":"ExternalMetricValueList","verbs":["get"]},{"name":"cost_hour","singularName":"","namespaced":true,"kind":"ExternalMetricValueList","verbs":["get"]},{"name":"slb_l7_status_2xx","singularName":"","namespaced":true,"kind":"ExternalMetricValueList","verbs":["get"]},{"name":"slb_l7_status_3xx","singularName":"","namespaced":true,"kind":"ExternalMetricValueList","verbs":["get"]},{"name":"k8s_workload_cpu_util","singularName":"","namespaced":true,"kind":"ExternalMetricValueList","verbs":["get"]},{"name":"k8s_workload_memory_usage","singularName":"","namespaced":true,"kind":"ExternalMetricValueList","verbs":["get"]},{"name":"k8s_workload_network_tx_rate","singularName":"","namespaced":true,"kind":"ExternalMetricValueList","verbs":["get"]},{"name":"ahas_sentinel_total_qps","singularName":"","namespaced":true,"kind":"ExternalMetricValueList","verbs":["get"]},{"name":"cost_cpu_usage","singularName":"","namespaced":true,"kind":"ExternalMetricValueList","verbs":["get"]},{"name":"k8s_workload_network_tx_errors","singularName":"","namespaced":true,"kind":"ExternalMetricValueList","verbs":["get"]},{"name":"slb_l7_rt","singularName":"","namespaced":true,"kind":"ExternalMetricValueList","verbs":["get"]},{"name":"cost_min","singularName":"","namespaced":true,"kind":"ExternalMetricValueList","verbs":["get"]},{"name":"sls_ingress_latency_p50","singularName":"","namespaced":true,"kind":"ExternalMetricValueList","verbs":["get"]},{"name":"sls_ingress_latency_p99","singularName":"","namespaced":true,"kind":"ExternalMetricValueList","verbs":["get"]},{"name":"slb_l7_upstream_4xx","singularName":"","namespaced":true,"kind":"ExternalMetricValueList","verbs":["get"]},{"name":"k8s_workload_memory_request","singularName":"","namespaced":true,"kind":"ExternalMetricValueList","verbs":["get"]},{"name":"ahas_sentinel_avg_rt","singularName":"","namespaced":true,"kind":"ExternalMetricValueList","verbs":["get"]},{"name":"cost_memory_limit","singularName":"","namespaced":true,"kind":"ExternalMetricValueList","verbs":["get"]},{"name":"arms_app_requests_per_second_queryuser","singularName":"","namespaced":true,"kind":"ExternalMetricValueList","verbs":["get"]}]}
預期輸出表明,指標arms_app_requests_per_second_queryuser存在。
執行如下命令,查看指標即時資料。
kubectl get --raw "/apis/external.metrics.k8s.io/v1beta1/namespaces/arms-demo/arms_app_requests_per_second_queryuser"| jq .
預期輸出:
{ "kind": "ExternalMetricValueList", "apiVersion": "external.metrics.k8s.io/v1beta1", "metadata": {}, "items": [ { "metricName": "arms_app_requests_per_second_queryuser", "metricLabels": { "rpc": "/demo/queryUser/10" }, "timestamp": "2022-11-09T07:49:07Z", "value": "6" } ] }
預期輸出表明,即時資料返回正常。
步驟五:配置APM指標進行HPA擴縮
使用如下內容,建立hpa.yaml。
說明hpa.yaml中的配置指標名與上一步ack-alibaba-cloud-metrics-adapter中定義的指標名需保持一致。
hpa.yaml中的
target
為彈性閾值,當QPS > 40時進行擴容。
apiVersion: autoscaling/v2beta2 kind: HorizontalPodAutoscaler metadata: name: test-hpa spec: scaleTargetRef: apiVersion: apps/v1 kind: Deployment name: arms-springboot-demo minReplicas: 1 maxReplicas: 10 metrics: - type: External external: metric: name: arms_app_requests_per_second_queryuser # External指標類型下只支援Value和AverageValue類型的目標值。 target: type: AverageValue averageValue: 40
執行如下命令,對業務應用arms-springboot-demo部署HPA。
kubectl apply -f hpa.yaml
執行如下命令,查看指標變化。
kubectl get --raw "/apis/external.metrics.k8s.io/v1beta1/namespaces/arms-demo/arms_app_requests_per_second_queryuser"| jq .
預期輸出:
{ "kind": "ExternalMetricValueList", "apiVersion": "external.metrics.k8s.io/v1beta1", "metadata": {}, "items": [ { "metricName": "arms_app_requests_per_second_queryuser", "metricLabels": { "rpc": "/demo/queryUser/10" }, "timestamp": "2022-11-09T07:53:16Z", "value": "4216" } ] }
執行如下命令,查看HPA詳情。
kubectl get hpa -n arms-demo
預期輸出:
NAME REFERENCE TARGETS MINPODS MAXPODS REPLICAS AGE test-hpa Deployment/arms-springboot-demo 300m/40 (avg) 1 10 10 148m
預期輸出表明,Targets存在資料,HPA配置成功。
通過壓測查看彈性擴縮容效果
執行如下命令,對Demo應用進行壓測實驗。
ab -c 50 -n 2000 http://47.94.XX.XX:8080/demo/queryUser/10
說明47.94.XX.XX:8080
為服務arms-demo-svc的外部連接埠。查看彈性擴縮容效果。
可以在ARMS APM控制台看到,此介面的請求量因壓測飆升。
可以在Prometheus大盤看到,當應用介面的QPS值超過閾值時,達到了HPA擴縮的效果。
在ACK叢集中可以看到此demo應用的Pod副本數隨介面調用的QPS進行擴縮。
您可以通過執行命令
kubectl describe hpa test-hpa -n arms-demo
查看發生的擴縮容事件。