Service Mesh (ASM) collects telemetry data for Container Service for Kubernetes (ACK) clusters in a non-intrusive manner, which makes the service communication in the clusters observable. This telemetry feature makes service behaviors observable and helps O&M staff troubleshoot, maintain, and optimize applications without increasing maintenance costs. Based on the four key metrics, including latency, traffic, errors, and saturation, Service Mesh generates a series of metrics for the services that it manages. This topic describes how to implement auto scaling for workloads by using ASM metrics.
Prerequisites
An ACK cluster is created. For more information, see Create an ACK managed cluster.
An ASM instance is created. For more information, see Create an ASM instance.
A Prometheus instance and a Grafana instance are deployed in the ACK cluster. For more information, see Use open source Prometheus to monitor an ACK cluster.
A Prometheus instance is deployed to monitor the ASM instance. For more information, see Monitor ASM instances by using a self-managed Prometheus instance.
Background information
Service Mesh generates a series of metrics for the services that it manages. For more information, visit Istio Standard Metrics.
Auto scaling is an approach that is used to automatically scale up or down workloads based on resource usage. In Kubernetes, two autoscalers are used to implement auto scaling.
Cluster Autoscaler (CA): CAs are used to increase or decrease the number of nodes in a cluster.
Horizontal Pod Autoscaler (HPA): HPAs are used to increase or decrease the number of pods that are used to deploy applications.
The aggregation layer of Kubernetes allows third-party applications to extend the Kubernetes API by registering themselves as API add-ons. These add-ons can be used to implement the custom metrics API and allow HPAs to query any metrics. HPAs periodically query core metrics such as CPU utilization and memory usage by using the resource metrics API. In addition, HPAs use the custom metrics API to query application-specific metrics, such as the observability metrics that are provided by ASM.
Step 1: Enable Prometheus monitoring for the ASM instance
For more information, see Collect metrics to Managed Service for Prometheus.
Step 2: Deploy the adapter for the custom metrics API
Run the following command to download the installation package of the adapter. Then, install and deploy the adapter for the custom metrics API in the ACK cluster.
For more information, visit kube-metrics-adapter.
## Use Helm 3. helm -n kube-system install asm-custom-metrics ./kube-metrics-adapter --set prometheus.url=http://prometheus.istio-system.svc:9090
After the installation is complete, run the following commands to check whether kube-metrics-adapter is enabled.
Run the following command to verify that
autoscaling/v2beta
exists:kubectl api-versions |grep "autoscaling/v2beta"
Expected output:
autoscaling/v2beta
Run the following command to check the status of the pod of kube-metrics-adapter:
kubectl get po -n kube-system |grep metrics-adapter
Expected output:
asm-custom-metrics-kube-metrics-adapter-85c6d5d865-2**** 1/1 Running 0 19s
Run the following command to query the custom metrics that are provided by kube-metrics-adapter:
kubectl get --raw "/apis/external.metrics.k8s.io/v1beta1" | jq .
Expected output:
{ "kind": "APIResourceList", "apiVersion": "v1", "groupVersion": "external.metrics.k8s.io/v1beta1", "resources": [] }
Step 3: Deploy a sample application
Create a namespace named test. For more information, see Manage namespaces and resource quotas.
Enable automatic sidecar proxy injection. For more information, see Install a sidecar proxy.
Deploy a sample application.
Create a file named podinfo.yaml and copy the following content to the file:
apiVersion: apps/v1 kind: Deployment metadata: name: podinfo namespace: test labels: app: podinfo spec: minReadySeconds: 5 strategy: rollingUpdate: maxUnavailable: 0 type: RollingUpdate selector: matchLabels: app: podinfo template: metadata: annotations: prometheus.io/scrape: "true" labels: app: podinfo spec: containers: - name: podinfod image: stefanprodan/podinfo:latest imagePullPolicy: IfNotPresent ports: - containerPort: 9898 name: http protocol: TCP command: - ./podinfo - --port=9898 - --level=info livenessProbe: exec: command: - podcli - check - http - localhost:9898/healthz initialDelaySeconds: 5 timeoutSeconds: 5 readinessProbe: exec: command: - podcli - check - http - localhost:9898/readyz initialDelaySeconds: 5 timeoutSeconds: 5 resources: limits: cpu: 2000m memory: 512Mi requests: cpu: 100m memory: 64Mi --- apiVersion: v1 kind: Service metadata: name: podinfo namespace: test labels: app: podinfo spec: type: ClusterIP ports: - name: http port: 9898 targetPort: 9898 protocol: TCP selector: app: podinfo
Run the following command to deploy the podinfo application:
kubectl apply -n test -f podinfo.yaml
To trigger auto scaling, you must deploy a load testing service in the test namespace for triggering requests.
Create a file named loadtester.yaml and copy the following content to the file:
apiVersion: apps/v1 kind: Deployment metadata: name: loadtester namespace: test labels: app: loadtester spec: selector: matchLabels: app: loadtester template: metadata: labels: app: loadtester annotations: prometheus.io/scrape: "true" spec: containers: - name: loadtester image: weaveworks/flagger-loadtester:0.18.0 imagePullPolicy: IfNotPresent ports: - name: http containerPort: 8080 command: - ./loadtester - -port=8080 - -log-level=info - -timeout=1h livenessProbe: exec: command: - wget - --quiet - --tries=1 - --timeout=4 - --spider - http://localhost:8080/healthz timeoutSeconds: 5 readinessProbe: exec: command: - wget - --quiet - --tries=1 - --timeout=4 - --spider - http://localhost:8080/healthz timeoutSeconds: 5 resources: limits: memory: "512Mi" cpu: "1000m" requests: memory: "32Mi" cpu: "10m" securityContext: readOnlyRootFilesystem: true runAsUser: 10001 --- apiVersion: v1 kind: Service metadata: name: loadtester namespace: test labels: app: loadtester spec: type: ClusterIP selector: app: loadtester ports: - name: http port: 80 protocol: TCP targetPort: http
Run the following command to deploy the load testing service:
kubectl apply -n test -f loadtester.yaml
Check whether the sample application and the load testing service are deployed.
Run the following command to check the pod status:
kubectl get pod -n test
Expected output:
NAME READY STATUS RESTARTS AGE loadtester-64df4846b9-nxhvv 2/2 Running 0 2m8s podinfo-6d845cc8fc-26xbq 2/2 Running 0 11m
Run the following commands to log on to the container for load testing and run the hey command to generate loads:
export loadtester=$(kubectl -n test get pod -l "app=loadtester" -o jsonpath='{.items[0].metadata.name}') kubectl -n test exec -it ${loadtester} -c loadtester -- hey -z 5s -c 10 -q 2 http://podinfo.test:9898
A load is generated, which indicates that the sample application and the load testing service are deployed.
Step 4: Configure an HPA by using ASM metrics
Define an HPA to scale the workloads of the podinfo application based on the number of requests that the podinfo application receives per second. When more than 10 requests are received per second on average, the HPA increases the number of replicas.
Create a file named hpa.yaml and copy the following code to the file:
apiVersion: autoscaling/v2beta2 kind: HorizontalPodAutoscaler metadata: name: podinfo namespace: test annotations: metric-config.external.prometheus-query.prometheus/processed-requests-per-second: | sum( rate( istio_requests_total{ destination_workload="podinfo", destination_workload_namespace="test", reporter="destination" }[1m] ) ) spec: maxReplicas: 10 minReplicas: 1 scaleTargetRef: apiVersion: apps/v1 kind: Deployment name: podinfo metrics: - type: External external: metric: name: prometheus-query selector: matchLabels: query-name: processed-requests-per-second target: type: AverageValue averageValue: "10"
Run the following command to deploy the HPA:
kubectl apply -f hpa.yaml
Run the following command to check whether the HPA is deployed.
Run the following command to query the custom metrics that are provided by kube-metrics-adapter:
kubectl get --raw "/apis/external.metrics.k8s.io/v1beta1" | jq .
Expected output:
{ "kind": "APIResourceList", "apiVersion": "v1", "groupVersion": "external.metrics.k8s.io/v1beta1", "resources": [ { "name": "prometheus-query", "singularName": "", "namespaced": true, "kind": "ExternalMetricValueList", "verbs": [ "get" ] } ] }
The output contains the list of custom ASM metrics, which indicates that the HPA is deployed.
Verify auto scaling
Run the following command to log on to the container for load testing and run the hey command to generate loads:
kubectl -n test exec -it ${loadtester} -c loadtester -- sh ~ $ hey -z 5m -c 10 -q 5 http://podinfo.test:9898
Run the following command to check the effect of auto scaling.
NoteMetrics are synchronized every 30 seconds by default. The container can be scaled only once every 3 to 5 minutes. This way, the HPA can reserve time for automatic scaling before the conflict strategy is executed.
watch kubectl -n test get hpa/podinfo
Expected output:
NAME REFERENCE TARGETS MINPODS MAXPODS REPLICAS AGE podinfo Deployment/podinfo 8308m/10 (avg) 1 10 6 124m
The HPA starts to scale up workloads in 1 minute until the number of requests per second decreases under the specified threshold. After the load testing is complete, the number of requests per second decreases to zero. Then, the HPA starts to decrease the number of pods. A few minutes later, the number of replicas decreases from the value in the preceding output to one.