Implement auto scaling for workloads by using ASM metrics - Alibaba Cloud Service Mesh

Service Mesh (ASM) collects telemetry data for Container Service for Kubernetes (ACK) clusters and Container Compute Service (ACS) clusters in a non-intrusive manner, which makes the service communication in the clusters observable. This telemetry feature makes service behaviors observable and helps O&M staff troubleshoot, maintain, and optimize applications without increasing maintenance costs. Based on the four key metrics, including latency, traffic, errors, and saturation, ASM generates a series of metrics for the services that it manages. This topic describes how to implement auto scaling for workloads by using ASM metrics.

Prerequisites

An ACK cluster or ACS cluster is created. For more information, see Create an ACK managed cluster or Create an ACS cluster.
An ASM instance is created. For more information, see Create an ASM instance.
A Prometheus instance and a Grafana instance are deployed in the clusters. For more information, see Use open source Prometheus to monitor an ACK cluster.
A Prometheus instance is deployed to monitor the ASM instance. For more information, see Monitor ASM instances by using a self-managed Prometheus instance.

Background

ASM generates a series of metrics for the services that it manages. For more information, see Istio Standard Metrics.

Auto scaling is an approach that is used to automatically scale up or down workloads based on resource usage. In Kubernetes, two autoscalers are used to implement auto scaling.

Cluster Autoscaler (CA): CAs are used to increase or decrease the number of nodes in a cluster.
Horizontal Pod Autoscaler (HPA): HPAs are used to increase or decrease the number of pods that are used to deploy applications.

The aggregation layer of Kubernetes allows third-party applications to extend the Kubernetes API by registering themselves as API add-ons. These add-ons can be used to implement the custom metrics API and allow HPAs to query any metrics. HPAs periodically query core metrics such as CPU utilization and memory usage by using the resource metrics API. In addition, HPAs use the custom metrics API to query application-specific metrics, such as the observability metrics that are provided by ASM.

Step 1: Enable Prometheus monitoring for the ASM instance

For more information, see Collect metrics to Managed Service for Prometheus.

Step 2: Deploy the adapter for the custom metrics API

Run the following command to download the installation package of the adapter. Then, install and deploy the adapter for the custom metrics API in the cluster.
For more information, visit kube-metrics-adapter.
```
## Use Helm 3. 
helm -n kube-system install asm-custom-metrics ./kube-metrics-adapter  --set prometheus.url=http://prometheus.istio-system.svc:9090
```

After the installation is complete, run the following commands to check whether kube-metrics-adapter is enabled.

Run the following command to verify that autoscaling/v2beta exists:
```
kubectl api-versions |grep "autoscaling/v2beta"
```
Expected output:
```
autoscaling/v2beta
```

Run the following command to check the status of the pod of kube-metrics-adapter:

kubectl get po -n kube-system |grep metrics-adapter

Expected output:

asm-custom-metrics-kube-metrics-adapter-85c6d5d865-2****          1/1     Running   0          19s

Run the following command to query the custom metrics that are provided by kube-metrics-adapter:

kubectl get --raw "/apis/external.metrics.k8s.io/v1beta1" | jq .

Expected output:

{
  "kind": "APIResourceList",
  "apiVersion": "v1",
  "groupVersion": "external.metrics.k8s.io/v1beta1",
  "resources": []
}

Step 3: Deploy a sample application

Create a namespace named test. For more information, see Manage namespaces and resource quotas.
Enable automatic sidecar proxy injection. For more information, see Enable automatic sidecar proxy injection.

Deploy a sample application.

Create a file named podinfo.yaml and copy the following content to the file:

apiVersion: apps/v1
kind: Deployment
metadata:
  name: podinfo
  namespace: test
  labels:
    app: podinfo
spec:
  minReadySeconds: 5
  strategy:
    rollingUpdate:
      maxUnavailable: 0
    type: RollingUpdate
  selector:
    matchLabels:
      app: podinfo
  template:
    metadata:
      annotations:
        prometheus.io/scrape: "true"
      labels:
        app: podinfo
    spec:
      containers:
      - name: podinfod
        image: stefanprodan/podinfo:latest
        imagePullPolicy: IfNotPresent
        ports:
        - containerPort: 9898
          name: http
          protocol: TCP
        command:
        - ./podinfo
        - --port=9898
        - --level=info
        livenessProbe:
          exec:
            command:
            - podcli
            - check
            - http
            - localhost:9898/healthz
          initialDelaySeconds: 5
          timeoutSeconds: 5
        readinessProbe:
          exec:
            command:
            - podcli
            - check
            - http
            - localhost:9898/readyz
          initialDelaySeconds: 5
          timeoutSeconds: 5
        resources:
          limits:
            cpu: 2000m
            memory: 512Mi
          requests:
            cpu: 100m
            memory: 64Mi
---
apiVersion: v1
kind: Service
metadata:
  name: podinfo
  namespace: test
  labels:
    app: podinfo
spec:
  type: ClusterIP
  ports:
    - name: http
      port: 9898
      targetPort: 9898
      protocol: TCP
  selector:
    app: podinfo

Run the following command to deploy a podinfo application:
```
kubectl apply -n test -f podinfo.yaml
```

To trigger auto scaling, you must deploy a load testing service in the test namespace for triggering requests.

Create a file named loadtester.yaml and copy the following content to the file:

apiVersion: apps/v1
kind: Deployment
metadata:
  name: loadtester
  namespace: test
  labels:
    app: loadtester
spec:
  selector:
    matchLabels:
      app: loadtester
  template:
    metadata:
      labels:
        app: loadtester
      annotations:
        prometheus.io/scrape: "true"
    spec:
      containers:
        - name: loadtester
          image: weaveworks/flagger-loadtester:0.18.0
          imagePullPolicy: IfNotPresent
          ports:
            - name: http
              containerPort: 8080
          command:
            - ./loadtester
            - -port=8080
            - -log-level=info
            - -timeout=1h
          livenessProbe:
            exec:
              command:
                - wget
                - --quiet
                - --tries=1
                - --timeout=4
                - --spider
                - http://localhost:8080/healthz
            timeoutSeconds: 5
          readinessProbe:
            exec:
              command:
                - wget
                - --quiet
                - --tries=1
                - --timeout=4
                - --spider
                - http://localhost:8080/healthz
            timeoutSeconds: 5
          resources:
            limits:
              memory: "512Mi"
              cpu: "1000m"
            requests:
              memory: "32Mi"
              cpu: "10m"
          securityContext:
            readOnlyRootFilesystem: true
            runAsUser: 10001
---
apiVersion: v1
kind: Service
metadata:
  name: loadtester
  namespace: test
  labels:
    app: loadtester
spec:
  type: ClusterIP
  selector:
    app: loadtester
  ports:
    - name: http
      port: 80
      protocol: TCP
      targetPort: http

Run the following command to deploy the load testing service:
```
kubectl apply -n test -f loadtester.yaml
```

Check whether the sample application and the load testing service are deployed.

Run the following command to check the pod status:

kubectl get pod -n test

Expected output:

NAME                          READY   STATUS    RESTARTS   AGE
loadtester-64df4846b9-nxhvv   2/2     Running   0          2m8s
podinfo-6d845cc8fc-26xbq      2/2     Running   0          11m

Run the following commands to log on to the container for load testing and run the hey command to generate loads:
```
export loadtester=$(kubectl -n test get pod -l "app=loadtester" -o jsonpath='{.items[0].metadata.name}')
kubectl -n test exec -it ${loadtester} -c loadtester -- hey -z 5s -c 10 -q 2 http://podinfo.test:9898
```
A load is generated, which indicates that the sample application and the load testing service are deployed.

Step 4: Configure an HPA by using ASM metrics

Define an HPA to scale the workloads of the podinfo application based on the number of requests that the podinfo application receives per second. When more than 10 requests are received per second on average, the HPA increases the number of replicas.

Create a file named hpa.yaml and copy the following code to the file:

apiVersion: autoscaling/v2beta2
kind: HorizontalPodAutoscaler
metadata:
  name: podinfo
  namespace: test
  annotations:
    metric-config.external.prometheus-query.prometheus/processed-requests-per-second: |
      sum(
          rate(
              istio_requests_total{
                destination_workload="podinfo",
                destination_workload_namespace="test",
                reporter="destination"
              }[1m]
          )
      ) 
spec:
  maxReplicas: 10
  minReplicas: 1
  scaleTargetRef:
    apiVersion: apps/v1
    kind: Deployment
    name: podinfo
  metrics:
    - type: External
      external:
        metric:
          name: prometheus-query
          selector:
            matchLabels:
              query-name: processed-requests-per-second
        target:
          type: AverageValue
          averageValue: "10"

Run the following command to deploy the HPA:
```
kubectl apply -f hpa.yaml
```

Run the following command to check whether the HPA is deployed.

Run the following command to query the custom metrics that are provided by kube-metrics-adapter:

kubectl get --raw "/apis/external.metrics.k8s.io/v1beta1" | jq .

Expected output:

{
  "kind": "APIResourceList",
  "apiVersion": "v1",
  "groupVersion": "external.metrics.k8s.io/v1beta1",
  "resources": [
    {
      "name": "prometheus-query",
      "singularName": "",
      "namespaced": true,
      "kind": "ExternalMetricValueList",
      "verbs": [
        "get"
      ]
    }
  ]
}

The output contains the list of custom ASM metrics, which indicates that the HPA is deployed.

Verify auto scaling

Run the following command to log on to the container for load testing and run the hey command to generate loads:
```
kubectl -n test exec -it ${loadtester} -c loadtester -- sh
~ $ hey -z 5m -c 10 -q 5 http://podinfo.test:9898
```
Run the following command to check the effect of auto scaling.
Note
Metrics are synchronized every 30 seconds by default. The container can be scaled only once every 3 to 5 minutes. This way, the HPA can reserve time for automatic scaling before the conflict strategy is executed.
```
watch kubectl -n test get hpa/podinfo
```
Expected output:
```
NAME      REFERENCE            TARGETS          MINPODS   MAXPODS   REPLICAS   AGE
podinfo   Deployment/podinfo   8308m/10 (avg)   1         10        6          124m
```
The HPA starts to scale up workloads in 1 minute until the number of requests per second decreases under the specified threshold. After the load testing is complete, the number of requests per second decreases to zero. Then, the HPA starts to decrease the number of pods. A few minutes later, the number of replicas decreases from the value in the preceding output to one.