当应用接口的请求访问量飙升时,您可以通过Java应用接口的QPS配置HPA弹性策略,实现应用的弹性扩缩。本文介绍如何通过ARMS APM应用监控服务实现应用的HPA弹性伸缩。
工作原理
将ACK集群中的Java应用接入ARMS APM应用监控服务后,您可以通过ARMS APM获取应用接口的访问详情。关于如何将Java应用接入ARMS APM应用监控服务,请参见应用监控。ARMS APM应用监控服务将ARMS APM数据转换为阿里云Prometheus数据格式,alibaba-cloud-metrics-adapter组件将阿里云Prometheus指标转换成HPA可用的指标,最终实现应用的HPA弹性伸缩。
本文以部署应用arms-springboot-demo,并压测其中接口/demo/queryUser/10为例进行介绍。
前提条件
已部署阿里云Prometheus监控组件。具体操作,请参见步骤一:开启阿里云Prometheus监控。
已在命名空间kube-system中部署alibaba-cloud-metrics-adapter组件。具体操作,请参见部署alibaba-cloud-metrics-adapter组件。
已创建命名空间。具体操作,请参见管理命名空间与配额。本文创建的示例命名空间为arms-demo。
已安装JDK。关于ARMS APM应用监控支持的JDK版本,请参见ARMS应用监控支持的Java组件和框架。
操作流程
步骤一:安装ARMS APM应用监控组件
为应用接入ARMS APM应用监控功能,需要在集群中安装ARMS APM应用监控组件one-pilot。
登录容器服务管理控制台,在左侧导航栏选择集群。
在集群列表页面,单击目标集群名称,然后在左侧导航栏,选择 。
在组件管理页面,搜索并定位ack-onepilot组件,在组件卡片区域单击安装,然后按照对话框提示配置参数,并单击确认。
步骤二:授予ARMS资源的访问权限
如需监控ACK Serverless集群或对接了ECI的集群应用,请在云资源访问授权页面完成授权,然后重启ack-onepilot组件下的所有Pod。
如需监控ACK集群应用,请先查看是否存在ARMS Addon Token。
如果ACK集群存在ARMS Addon Token,此时ARMS会进行免密授权。
说明Kubernetes托管版集群默认存在ARMS Addon Token。但对于部分早期创建的Kubernetes托管版集群可能不存在ARMS Addon Token,请参考下文内容手动为集群授予ARMS资源的访问权限。
如果ACK集群中不存在ARMS Addon Token,请执行以下操作,手动为集群授予ARMS资源的访问权限。
创建自定义权限策略,策略内容如下。具体操作,请参见步骤一:创建自定义权限策略。
{ "Action": "arms:*", "Resource": "*", "Effect": "Allow" }
为集群的WorkerRole添加上一步创建的自定义权限。具体操作,请参见步骤二:为集群的Worker RAM角色授权。
步骤三:为Java应用开启ARMS APM应用监控
在集群中部署Java应用时,通过为应用打上Labels的方式开启ARMS APM应用监控。
登录容器服务管理控制台,在左侧导航栏选择集群。
在集群列表页面,单击目标集群名称,然后在左侧导航栏,选择 。
在无状态页面右上角,单击使用YAML创建资源。
选择示例模板,并在模板(YAML格式)中将以下
labels
添加到spec.template.metadata层级下。labels: armsPilotAutoEnable: "on" armsPilotCreateAppName: "<your-deployment-name>" # 请将<your-deployment-name>替换为您的应用名称。 one-agent.jdk.version: "OpenJDK11" # 如果应用的JDK版本是JDK 11,则需要配置此参数。 armsSecAutoEnable: "on" # 如果需要接入应用安全,则需要配置此参数。
以下提供YAML示例模板,展示如何创建一个无状态(Deployment)应用并开启ARMS APM应用监控。
查看部署ARMS APM应用效果。
在无状态页面,目标应用的操作列将出现ARMS控制台按钮。
您可以单击ARMS控制台跳转查看监控数据。在左侧导航栏,单击接口调用,查看应用接口(如HTTP接口)的访问详情。此处提供的Demo应用arms-springboot-demo,已自动产生了平稳的接口调用。
手动创建关联应用arms-springboot-demo的Service,并开启负载均衡来访问此应用的接口。
在集群列表页面,单击目标集群名称,然后在左侧导航栏,选择 。
单击页面右上角创建,创建关联应用的Service,然后单击创建。关于配置项的说明,请参见创建服务。
稍等片刻,创建完成。在服务页面记录arms-demo-svc的外部端口,例如47.94.XX.XX:8080。
执行如下命令,通过外部端口访问此服务的/demo/queryUser/10接口。
curl http://47.94.XX.XX:8080/demo/queryUser/10
预期输出:
{"id":1,"name":"KeyOfSpectator","password":"12****"}
预期输出表明,接口访问正常。
步骤四:对接alibaba-cloud-metrics-adapter组件
请确保已部署阿里云Prometheus监控组件,否则无法进行本操作。具体操作,请参见步骤一:开启阿里云Prometheus监控。
请确保已在命名空间kube-system中部署alibaba-cloud-metrics-adapter组件,否则无法进行本操作。具体操作,请参见部署alibaba-cloud-metrics-adapter组件。
登录ARMS控制台。
在左侧导航栏选择 ,进入可观测监控 Prometheus 版的实例列表页面。
在实例列表页面,单击目标实例名称(格式为arms_metrics_{RegionId}_XXX),在左侧导航栏单击设置,然后在右侧设置页签的最下方查看并记录HTTP API地址(Grafana 读取地址),即Prometheus URL。
在ack-alibaba-cloud-metrics-adapter中填入上一步中记录的HTTP API地址(Grafana 读取地址)(Prometheus URL)。
修改ack-alibaba-cloud-metrics-adapter的adapter-config配置。
在Helm页面,单击ack-alibaba-cloud-metrics-adapter。
在基本信息页签,单击adapter-config。
单击页面右上方YAML 编辑。
将如下内容添加至adapter-config中。
rules: - metricsQuery: sum by (rpc) (sum_over_time(<<.Series>>{rpc="/demo/queryUser/{id}",service="arms-demo:arms-k8s-demo",prpc="__all__",ppid="__all__",endpoint="__all__",destId="__all__",<<.LabelMatchers>>}[1m])) name: as: ${1}_per_second_queryuser matches: ^(.*)_count resources: namespaced: false seriesQuery: arms_app_requests_count
完整示例如下:
执行如下命令,查看集群中指标数据。
执行如下命令,查看指标arms_app_requests_per_second_queryuser是否存在。
kubectl get --raw "/apis/external.metrics.k8s.io/v1beta1"
预期输出:
{"kind":"APIResourceList","apiVersion":"v1","groupVersion":"external.metrics.k8s.io/v1beta1","resources":[{"name":"k8s_workload_memory_working_set","singularName":"","namespaced":true,"kind":"ExternalMetricValueList","verbs":["get"]},{"name":"k8s_workload_memory_rss","singularName":"","namespaced":true,"kind":"ExternalMetricValueList","verbs":["get"]},{"name":"sls_ingress_latency_p9999","singularName":"","namespaced":true,"kind":"ExternalMetricValueList","verbs":["get"]},{"name":"slb_l7_qps","singularName":"","namespaced":true,"kind":"ExternalMetricValueList","verbs":["get"]},{"name":"cost_memory_usage","singularName":"","namespaced":true,"kind":"ExternalMetricValueList","verbs":["get"]},{"name":"slb_l4_traffic_rx","singularName":"","namespaced":true,"kind":"ExternalMetricValueList","verbs":["get"]},{"name":"sls_ingress_inflow","singularName":"","namespaced":true,"kind":"ExternalMetricValueList","verbs":["get"]},{"name":"slb_l7_upstream_rt","singularName":"","namespaced":true,"kind":"ExternalMetricValueList","verbs":["get"]},{"name":"cost_ratio","singularName":"","namespaced":true,"kind":"ExternalMetricValueList","verbs":["get"]},{"name":"slb_l4_traffic_tx","singularName":"","namespaced":true,"kind":"ExternalMetricValueList","verbs":["get"]},{"name":"slb_l4_packet_rx","singularName":"","namespaced":true,"kind":"ExternalMetricValueList","verbs":["get"]},{"name":"ahas_sentinel_pass_qps","singularName":"","namespaced":true,"kind":"ExternalMetricValueList","verbs":["get"]},{"name":"cost_memory_request","singularName":"","namespaced":true,"kind":"ExternalMetricValueList","verbs":["get"]},{"name":"sls_ingress_latency_avg","singularName":"","namespaced":true,"kind":"ExternalMetricValueList","verbs":["get"]},{"name":"slb_l4_max_connection","singularName":"","namespaced":true,"kind":"ExternalMetricValueList","verbs":["get"]},{"name":"cost_cpu_limit","singularName":"","namespaced":true,"kind":"ExternalMetricValueList","verbs":["get"]},{"name":"cost_day","singularName":"","namespaced":true,"kind":"ExternalMetricValueList","verbs":["get"]},{"name":"cost_month","singularName":"","namespaced":true,"kind":"ExternalMetricValueList","verbs":["get"]},{"name":"slb_l4_connection_utilization","singularName":"","namespaced":true,"kind":"ExternalMetricValueList","verbs":["get"]},{"name":"slb_l7_status_5xx","singularName":"","namespaced":true,"kind":"ExternalMetricValueList","verbs":["get"]},{"name":"k8s_workload_cpu_request","singularName":"","namespaced":true,"kind":"ExternalMetricValueList","verbs":["get"]},{"name":"k8s_workload_network_rx_rate","singularName":"","namespaced":true,"kind":"ExternalMetricValueList","verbs":["get"]},{"name":"k8s_workload_network_rx_errors","singularName":"","namespaced":true,"kind":"ExternalMetricValueList","verbs":["get"]},{"name":"cost_week","singularName":"","namespaced":true,"kind":"ExternalMetricValueList","verbs":["get"]},{"name":"k8s_workload_memory_cache","singularName":"","namespaced":true,"kind":"ExternalMetricValueList","verbs":["get"]},{"name":"cost_cpu_request","singularName":"","namespaced":true,"kind":"ExternalMetricValueList","verbs":["get"]},{"name":"cost_percorepricing","singularName":"","namespaced":true,"kind":"ExternalMetricValueList","verbs":["get"]},{"name":"sls_ingress_qps","singularName":"","namespaced":true,"kind":"ExternalMetricValueList","verbs":["get"]},{"name":"slb_l4_packet_tx","singularName":"","namespaced":true,"kind":"ExternalMetricValueList","verbs":["get"]},{"name":"slb_l7_status_4xx","singularName":"","namespaced":true,"kind":"ExternalMetricValueList","verbs":["get"]},{"name":"slb_l7_upstream_5xx","singularName":"","namespaced":true,"kind":"ExternalMetricValueList","verbs":["get"]},{"name":"k8s_workload_cpu_limit","singularName":"","namespaced":true,"kind":"ExternalMetricValueList","verbs":["get"]},{"name":"ahas_sentinel_block_qps","singularName":"","namespaced":true,"kind":"ExternalMetricValueList","verbs":["get"]},{"name":"cost_cpu_utilization","singularName":"","namespaced":true,"kind":"ExternalMetricValueList","verbs":["get"]},{"name":"sls_alb_ingress_qps","singularName":"","namespaced":true,"kind":"ExternalMetricValueList","verbs":["get"]},{"name":"cost_memory_utilization","singularName":"","namespaced":true,"kind":"ExternalMetricValueList","verbs":["get"]},{"name":"sls_ingress_latency_p95","singularName":"","namespaced":true,"kind":"ExternalMetricValueList","verbs":["get"]},{"name":"slb_l4_active_connection","singularName":"","namespaced":true,"kind":"ExternalMetricValueList","verbs":["get"]},{"name":"k8s_workload_memory_limit","singularName":"","namespaced":true,"kind":"ExternalMetricValueList","verbs":["get"]},{"name":"cost_hour","singularName":"","namespaced":true,"kind":"ExternalMetricValueList","verbs":["get"]},{"name":"slb_l7_status_2xx","singularName":"","namespaced":true,"kind":"ExternalMetricValueList","verbs":["get"]},{"name":"slb_l7_status_3xx","singularName":"","namespaced":true,"kind":"ExternalMetricValueList","verbs":["get"]},{"name":"k8s_workload_cpu_util","singularName":"","namespaced":true,"kind":"ExternalMetricValueList","verbs":["get"]},{"name":"k8s_workload_memory_usage","singularName":"","namespaced":true,"kind":"ExternalMetricValueList","verbs":["get"]},{"name":"k8s_workload_network_tx_rate","singularName":"","namespaced":true,"kind":"ExternalMetricValueList","verbs":["get"]},{"name":"ahas_sentinel_total_qps","singularName":"","namespaced":true,"kind":"ExternalMetricValueList","verbs":["get"]},{"name":"cost_cpu_usage","singularName":"","namespaced":true,"kind":"ExternalMetricValueList","verbs":["get"]},{"name":"k8s_workload_network_tx_errors","singularName":"","namespaced":true,"kind":"ExternalMetricValueList","verbs":["get"]},{"name":"slb_l7_rt","singularName":"","namespaced":true,"kind":"ExternalMetricValueList","verbs":["get"]},{"name":"cost_min","singularName":"","namespaced":true,"kind":"ExternalMetricValueList","verbs":["get"]},{"name":"sls_ingress_latency_p50","singularName":"","namespaced":true,"kind":"ExternalMetricValueList","verbs":["get"]},{"name":"sls_ingress_latency_p99","singularName":"","namespaced":true,"kind":"ExternalMetricValueList","verbs":["get"]},{"name":"slb_l7_upstream_4xx","singularName":"","namespaced":true,"kind":"ExternalMetricValueList","verbs":["get"]},{"name":"k8s_workload_memory_request","singularName":"","namespaced":true,"kind":"ExternalMetricValueList","verbs":["get"]},{"name":"ahas_sentinel_avg_rt","singularName":"","namespaced":true,"kind":"ExternalMetricValueList","verbs":["get"]},{"name":"cost_memory_limit","singularName":"","namespaced":true,"kind":"ExternalMetricValueList","verbs":["get"]},{"name":"arms_app_requests_per_second_queryuser","singularName":"","namespaced":true,"kind":"ExternalMetricValueList","verbs":["get"]}]}
预期输出表明,指标arms_app_requests_per_second_queryuser存在。
执行如下命令,查看指标实时数据。
kubectl get --raw "/apis/external.metrics.k8s.io/v1beta1/namespaces/arms-demo/arms_app_requests_per_second_queryuser"| jq .
预期输出:
{ "kind": "ExternalMetricValueList", "apiVersion": "external.metrics.k8s.io/v1beta1", "metadata": {}, "items": [ { "metricName": "arms_app_requests_per_second_queryuser", "metricLabels": { "rpc": "/demo/queryUser/10" }, "timestamp": "2022-11-09T07:49:07Z", "value": "6" } ] }
预期输出表明,实时数据返回正常。
步骤五:配置APM指标进行HPA扩缩
使用如下内容,创建hpa.yaml。
说明hpa.yaml中的配置指标名与上一步ack-alibaba-cloud-metrics-adapter中定义的指标名需保持一致。
hpa.yaml中的
target
为弹性阈值,当QPS > 40时进行扩容。
apiVersion: autoscaling/v2beta2 kind: HorizontalPodAutoscaler metadata: name: test-hpa spec: scaleTargetRef: apiVersion: apps/v1 kind: Deployment name: arms-springboot-demo minReplicas: 1 maxReplicas: 10 metrics: - type: External external: metric: name: arms_app_requests_per_second_queryuser # External指标类型下只支持Value和AverageValue类型的目标值。 target: type: AverageValue averageValue: 40
执行如下命令,对业务应用arms-springboot-demo部署HPA。
kubectl apply -f hpa.yaml
执行如下命令,查看指标变化。
kubectl get --raw "/apis/external.metrics.k8s.io/v1beta1/namespaces/arms-demo/arms_app_requests_per_second_queryuser"| jq .
预期输出:
{ "kind": "ExternalMetricValueList", "apiVersion": "external.metrics.k8s.io/v1beta1", "metadata": {}, "items": [ { "metricName": "arms_app_requests_per_second_queryuser", "metricLabels": { "rpc": "/demo/queryUser/10" }, "timestamp": "2022-11-09T07:53:16Z", "value": "4216" } ] }
执行如下命令,查看HPA详情。
kubectl get hpa -n arms-demo
预期输出:
NAME REFERENCE TARGETS MINPODS MAXPODS REPLICAS AGE test-hpa Deployment/arms-springboot-demo 300m/40 (avg) 1 10 10 148m
预期输出表明,Targets存在数据,HPA配置成功。
通过压测查看弹性扩缩容效果
执行如下命令,对Demo应用进行压测实验。
ab -c 50 -n 2000 http://47.94.XX.XX:8080/demo/queryUser/10
说明47.94.XX.XX:8080
为服务arms-demo-svc的外部端口。查看弹性扩缩容效果。
可以在ARMS APM控制台看到,此接口的请求量因压测飙升。
可以在Prometheus大盘看到,当应用接口的QPS值超过阈值时,达到了HPA扩缩的效果。
在ACK集群中可以看到此demo应用的Pod副本数随接口调用的QPS进行扩缩。
您可以通过执行命令
kubectl describe hpa test-hpa -n arms-demo
查看发生的扩缩容事件。