All Products
Search
Document Center

Alibaba Cloud Service Mesh:Implement auto scaling for workloads by using ASM metrics

Last Updated:Sep 13, 2023

Service Mesh (ASM) collects telemetry data for Container Service for Kubernetes (ACK) clusters in a non-intrusive manner, which makes the service communication in the clusters observable. This telemetry feature makes service behaviors observable and helps O&M staff troubleshoot, maintain, and optimize applications without increasing maintenance costs. Based on the four key metrics, including latency, traffic, errors, and saturation, Service Mesh generates a series of metrics for the services that it manages. This topic describes how to implement auto scaling for workloads by using ASM metrics.

Prerequisites

Background information

Service Mesh generates a series of metrics for the services that it manages. For more information, visit Istio Standard Metrics.

Auto scaling is an approach that is used to automatically scale up or down workloads based on resource usage. In Kubernetes, two autoscalers are used to implement auto scaling.

  • Cluster Autoscaler (CA): CAs are used to increase or decrease the number of nodes in a cluster.

  • Horizontal Pod Autoscaler (HPA): HPAs are used to increase or decrease the number of pods that are used to deploy applications.

The aggregation layer of Kubernetes allows third-party applications to extend the Kubernetes API by registering themselves as API add-ons. These add-ons can be used to implement the custom metrics API and allow HPAs to query any metrics. HPAs periodically query core metrics such as CPU utilization and memory usage by using the resource metrics API. In addition, HPAs use the custom metrics API to query application-specific metrics, such as the observability metrics that are provided by ASM. 弹性伸缩

Step 1: Enable Prometheus monitoring for the ASM instance

For more information, see Collect metrics to Managed Service for Prometheus.

Step 2: Deploy the adapter for the custom metrics API

  1. Run the following command to download the installation package of the adapter. Then, install and deploy the adapter for the custom metrics API in the ACK cluster.

    For more information, visit kube-metrics-adapter.

    ## Use Helm 3. 
    helm -n kube-system install asm-custom-metrics ./kube-metrics-adapter  --set prometheus.url=http://prometheus.istio-system.svc:9090
  2. After the installation is complete, run the following commands to check whether kube-metrics-adapter is enabled.

    • Run the following command to verify that autoscaling/v2beta exists:

      kubectl api-versions |grep "autoscaling/v2beta"

      Expected output:

      autoscaling/v2beta
    • Run the following command to check the status of the pod of kube-metrics-adapter:

      kubectl get po -n kube-system |grep metrics-adapter

      Expected output:

      asm-custom-metrics-kube-metrics-adapter-85c6d5d865-2****          1/1     Running   0          19s
    • Run the following command to query the custom metrics that are provided by kube-metrics-adapter:

      kubectl get --raw "/apis/external.metrics.k8s.io/v1beta1" | jq .

      Expected output:

      {
        "kind": "APIResourceList",
        "apiVersion": "v1",
        "groupVersion": "external.metrics.k8s.io/v1beta1",
        "resources": []
      }

Step 3: Deploy a sample application

  1. Create a namespace named test. For more information, see Manage namespaces and resource quotas.

  2. Enable automatic sidecar proxy injection. For more information, see Install a sidecar proxy.

  3. Deploy a sample application.

    1. Create a file named podinfo.yaml and copy the following content to the file:

      apiVersion: apps/v1
      kind: Deployment
      metadata:
        name: podinfo
        namespace: test
        labels:
          app: podinfo
      spec:
        minReadySeconds: 5
        strategy:
          rollingUpdate:
            maxUnavailable: 0
          type: RollingUpdate
        selector:
          matchLabels:
            app: podinfo
        template:
          metadata:
            annotations:
              prometheus.io/scrape: "true"
            labels:
              app: podinfo
          spec:
            containers:
            - name: podinfod
              image: stefanprodan/podinfo:latest
              imagePullPolicy: IfNotPresent
              ports:
              - containerPort: 9898
                name: http
                protocol: TCP
              command:
              - ./podinfo
              - --port=9898
              - --level=info
              livenessProbe:
                exec:
                  command:
                  - podcli
                  - check
                  - http
                  - localhost:9898/healthz
                initialDelaySeconds: 5
                timeoutSeconds: 5
              readinessProbe:
                exec:
                  command:
                  - podcli
                  - check
                  - http
                  - localhost:9898/readyz
                initialDelaySeconds: 5
                timeoutSeconds: 5
              resources:
                limits:
                  cpu: 2000m
                  memory: 512Mi
                requests:
                  cpu: 100m
                  memory: 64Mi
      ---
      apiVersion: v1
      kind: Service
      metadata:
        name: podinfo
        namespace: test
        labels:
          app: podinfo
      spec:
        type: ClusterIP
        ports:
          - name: http
            port: 9898
            targetPort: 9898
            protocol: TCP
        selector:
          app: podinfo
    2. Run the following command to deploy the podinfo application:

      kubectl apply -n test -f podinfo.yaml
  4. To trigger auto scaling, you must deploy a load testing service in the test namespace for triggering requests.

    1. Create a file named loadtester.yaml and copy the following content to the file:

      apiVersion: apps/v1
      kind: Deployment
      metadata:
        name: loadtester
        namespace: test
        labels:
          app: loadtester
      spec:
        selector:
          matchLabels:
            app: loadtester
        template:
          metadata:
            labels:
              app: loadtester
            annotations:
              prometheus.io/scrape: "true"
          spec:
            containers:
              - name: loadtester
                image: weaveworks/flagger-loadtester:0.18.0
                imagePullPolicy: IfNotPresent
                ports:
                  - name: http
                    containerPort: 8080
                command:
                  - ./loadtester
                  - -port=8080
                  - -log-level=info
                  - -timeout=1h
                livenessProbe:
                  exec:
                    command:
                      - wget
                      - --quiet
                      - --tries=1
                      - --timeout=4
                      - --spider
                      - http://localhost:8080/healthz
                  timeoutSeconds: 5
                readinessProbe:
                  exec:
                    command:
                      - wget
                      - --quiet
                      - --tries=1
                      - --timeout=4
                      - --spider
                      - http://localhost:8080/healthz
                  timeoutSeconds: 5
                resources:
                  limits:
                    memory: "512Mi"
                    cpu: "1000m"
                  requests:
                    memory: "32Mi"
                    cpu: "10m"
                securityContext:
                  readOnlyRootFilesystem: true
                  runAsUser: 10001
      ---
      apiVersion: v1
      kind: Service
      metadata:
        name: loadtester
        namespace: test
        labels:
          app: loadtester
      spec:
        type: ClusterIP
        selector:
          app: loadtester
        ports:
          - name: http
            port: 80
            protocol: TCP
            targetPort: http
    2. Run the following command to deploy the load testing service:

      kubectl apply -n test -f loadtester.yaml
  5. Check whether the sample application and the load testing service are deployed.

    1. Run the following command to check the pod status:

      kubectl get pod -n test

      Expected output:

      NAME                          READY   STATUS    RESTARTS   AGE
      loadtester-64df4846b9-nxhvv   2/2     Running   0          2m8s
      podinfo-6d845cc8fc-26xbq      2/2     Running   0          11m
    2. Run the following commands to log on to the container for load testing and run the hey command to generate loads:

      export loadtester=$(kubectl -n test get pod -l "app=loadtester" -o jsonpath='{.items[0].metadata.name}')
      kubectl -n test exec -it ${loadtester} -c loadtester -- hey -z 5s -c 10 -q 2 http://podinfo.test:9898

      A load is generated, which indicates that the sample application and the load testing service are deployed.

Step 4: Configure an HPA by using ASM metrics

Define an HPA to scale the workloads of the podinfo application based on the number of requests that the podinfo application receives per second. When more than 10 requests are received per second on average, the HPA increases the number of replicas.

  1. Create a file named hpa.yaml and copy the following code to the file:

    apiVersion: autoscaling/v2beta2
    kind: HorizontalPodAutoscaler
    metadata:
      name: podinfo
      namespace: test
      annotations:
        metric-config.external.prometheus-query.prometheus/processed-requests-per-second: |
          sum(
              rate(
                  istio_requests_total{
                    destination_workload="podinfo",
                    destination_workload_namespace="test",
                    reporter="destination"
                  }[1m]
              )
          ) 
    spec:
      maxReplicas: 10
      minReplicas: 1
      scaleTargetRef:
        apiVersion: apps/v1
        kind: Deployment
        name: podinfo
      metrics:
        - type: External
          external:
            metric:
              name: prometheus-query
              selector:
                matchLabels:
                  query-name: processed-requests-per-second
            target:
              type: AverageValue
              averageValue: "10"
  2. Run the following command to deploy the HPA:

    kubectl apply -f hpa.yaml
  3. Run the following command to check whether the HPA is deployed.

    Run the following command to query the custom metrics that are provided by kube-metrics-adapter:

    kubectl get --raw "/apis/external.metrics.k8s.io/v1beta1" | jq .

    Expected output:

    {
      "kind": "APIResourceList",
      "apiVersion": "v1",
      "groupVersion": "external.metrics.k8s.io/v1beta1",
      "resources": [
        {
          "name": "prometheus-query",
          "singularName": "",
          "namespaced": true,
          "kind": "ExternalMetricValueList",
          "verbs": [
            "get"
          ]
        }
      ]
    }

    The output contains the list of custom ASM metrics, which indicates that the HPA is deployed.

Verify auto scaling

  1. Run the following command to log on to the container for load testing and run the hey command to generate loads:

    kubectl -n test exec -it ${loadtester} -c loadtester -- sh
    ~ $ hey -z 5m -c 10 -q 5 http://podinfo.test:9898
  2. Run the following command to check the effect of auto scaling.

    Note

    Metrics are synchronized every 30 seconds by default. The container can be scaled only once every 3 to 5 minutes. This way, the HPA can reserve time for automatic scaling before the conflict strategy is executed.

    watch kubectl -n test get hpa/podinfo

    Expected output:

    NAME      REFERENCE            TARGETS          MINPODS   MAXPODS   REPLICAS   AGE
    podinfo   Deployment/podinfo   8308m/10 (avg)   1         10        6          124m

    The HPA starts to scale up workloads in 1 minute until the number of requests per second decreases under the specified threshold. After the load testing is complete, the number of requests per second decreases to zero. Then, the HPA starts to decrease the number of pods. A few minutes later, the number of replicas decreases from the value in the preceding output to one.