All Products
Search
Document Center

Container Service for Kubernetes:Workload scaling FAQ

Last Updated:Oct 30, 2024

This topic provides answers to some frequently asked questions about workload scaling (HPA and CronHPA).

Table of contents

What do I do if unknown is displayed in the current field in the HPA metrics?

If the current field of the Horizontal Pod Autoscaler (HPA) metrics shows unknown, it indicates that kube-controller-manager cannot collect resource metrics from the monitoring data source. Consequently, HPA fails to perform scaling.

Name:                                                  kubernetes-tutorial-deployment
Namespace:                                             default
Labels:                                                <none>
Annotations:                                           <none>
CreationTimestamp:                                     Mon, 10 Jun 2019 11:46:48  0530
Reference:                                             Deployment/kubernetes-tutorial-deployment
Metrics:                                               ( current / target )
  resource cpu on pods  (as a percentage of request):  <unknown> / 2%
Min replicas:                                          1
Max replicas:                                          4
Deployment pods:                                       1 current / 0 desired
Conditions:
  Type           Status  Reason                   Message
  ----           ------  ------                   -------
  AbleToScale    True    SucceededGetScale        the HPA controller was able to get the target's current scale
  ScalingActive  False   FailedGetResourceMetric  the HPA was unable to compute the replica count: unable to get metrics for resource cpu: unable to fetch metrics from resource metrics API: the server is currently unable to handle the request (get pods.metrics.k8s.io)
Events:
  Type     Reason                   Age                      From                       Message
  ----     ------                   ----                     ----                       -------
  Warning  FailedGetResourceMetric  3m3s (x1009 over 4h18m)  horizontal-pod-autoscaler  unable to get metrics for resource cpu: unable to fetch metrics from resource metrics API: the server is currently unable to handle the request (get pods.metrics.k8s.io)

Cause 1: The data source from which resource metrics are collected is unavailable

Run the kubectl top pod command to check whether metric data of monitored pods is returned. If no metric data is returned, run the kubectl get apiservice command to check whether the metrics-server component is available. The following output is an example of the returned data:

Show sample output

NAME                                   SERVICE                      AVAILABLE   AGE
v1.                                    Local                        True        29h
v1.admissionregistration.k8s.io        Local                        True        29h
v1.apiextensions.k8s.io                Local                        True        29h
v1.apps                                Local                        True        29h
v1.authentication.k8s.io               Local                        True        29h
v1.authorization.k8s.io                Local                        True        29h
v1.autoscaling                         Local                        True        29h
v1.batch                               Local                        True        29h
v1.coordination.k8s.io                 Local                        True        29h
v1.monitoring.coreos.com               Local                        True        29h
v1.networking.k8s.io                   Local                        True        29h
v1.rbac.authorization.k8s.io           Local                        True        29h
v1.scheduling.k8s.io                   Local                        True        29h
v1.storage.k8s.io                      Local                        True        29h
v1alpha1.argoproj.io                   Local                        True        29h
v1alpha1.fedlearner.k8s.io             Local                        True        5h11m
v1beta1.admissionregistration.k8s.io   Local                        True        29h
v1beta1.alicloud.com                   Local                        True        29h
v1beta1.apiextensions.k8s.io           Local                        True        29h
v1beta1.apps                           Local                        True        29h
v1beta1.authentication.k8s.io          Local                        True        29h
v1beta1.authorization.k8s.io           Local                        True        29h
v1beta1.batch                          Local                        True        29h
v1beta1.certificates.k8s.io            Local                        True        29h
v1beta1.coordination.k8s.io            Local                        True        29h
v1beta1.events.k8s.io                  Local                        True        29h
v1beta1.extensions                     Local                        True        29h
...
[v1beta1.metrics.k8s.io                 kube-system/metrics-server   True        29h]
...
v1beta1.networking.k8s.io              Local                        True        29h
v1beta1.node.k8s.io                    Local                        True        29h
v1beta1.policy                         Local                        True        29h
v1beta1.rbac.authorization.k8s.io      Local                        True        29h
v1beta1.scheduling.k8s.io              Local                        True        29h
v1beta1.storage.k8s.io                 Local                        True        29h
v1beta2.apps                           Local                        True        29h
v2beta1.autoscaling                    Local                        True        29h
v2beta2.autoscaling                    Local                        True        29h

If the API service for v1beta1.metrics.k8s.io is not kube-system/metrics-server, check whether metrics-server is overwritten by Prometheus Operator. If metrics-server is overwritten by Prometheus Operator, use the following YAML template to redeploy metrics-server:

apiVersion: apiregistration.k8s.io/v1beta1
kind: APIService
metadata:
  name: v1beta1.metrics.k8s.io
spec:
  service:
    name: metrics-server
    namespace: kube-system
  group: metrics.k8s.io
  version: v1beta1
  insecureSkipTLSVerify: true
  groupPriorityMinimum: 100
  versionPriority: 100

If the issue persists, go to the cluster details page in the ACK console and choose Operations > Add-ons to check whether metrics-server is installed. For more information, see metrics-server.

Cause 2: Metrics cannot be collected during a rolling update or scale-out activity

By default, metrics-server collects metrics at intervals of 1 minute. However, metrics-server must wait a few minutes before it can collect metrics after a rolling update or scale-out activity is completed. We recommend that you query metrics 2 minutes after a rolling update or scale-out activity is completed.

Cause 3: The request field is not specified

By default, HPA obtains the CPU or memory usage of the pod by calculating the value of resource usage/resource request. If the resource request is not specified in the pod configurations, HPA cannot calculate the resource usage. Therefore, you must make sure that the request field is specified in the resource parameter of the pod configurations.

Cause 4: The metric name is incorrect

Check whether the metric name is correct. The metric name is case-sensitive. For example, if the cpu metric supported by HPA is accidentally written as CPU, unknown is displayed in the current field.

What do I do if HPA cannot collect metrics and fails to perform scaling?

HPA may fail to perform scaling when it cannot collect metrics. When this issue occurs, unknown is displayed in the current field in the HPA metrics. In this case, HPA cannot collect metrics that are used to determine whether to perform scaling. As a result, HPA cannot scale pods. Refer to FAQ about node auto scaling to troubleshoot and fix this issue.

What do I do if excess pods are added by HPA during a rolling update?

During a rolling update, kube-controller-manager performs zero filling on pods whose monitoring data cannot be collected. This may cause HPA to add an excessive number of pods. You can perform the following steps to fix this issue.

Fix this issue for all workloads in the cluster

Update metrics-server to the latest version and add the following parameter to the startup settings of metrics-server.

The following configuration takes effect on all workloads in the cluster.

# Add the following configuration to the startup settings of metrics-server. 
--enable-hpa-rolling-update-skipped=true  

Fix this issue for specific workloads

You can use one of the following methods to fix this issue for specific workloads:

  • Method 1: Add the following annotation to the template of a workload to skip HPA during rolling updates.

    # Add the following annotation to the spec.template.metadata.annotations parameter of the workload configuration to skip HPA during rolling updates. 
    HPARollingUpdateSkipped: "true"

    Show sample code

    apiVersion: apps/v1
    kind: Deployment
    metadata:
      name: nginx-deployment-basic
      labels:
        app: nginx
    spec:
      replicas: 2
      selector:
        matchLabels:
          app: nginx
        template:
            metadata:
              labels:
                app: nginx
              annotations:
                HPARollingUpdateSkipped: "true"  # Skip HPA during rolling updates. 
            spec:
              containers:
              - name: nginx
                image: nginx:1.7.9
                ports:
                - containerPort: 80
  • Method 2: Add the following annotation to the template of a workload to skip the warm-up period before rolling updates.

    # Add the following annotation to the spec.template.metadata.annotations parameter of the workload configuration to skip the warm-up period before rolling updates. 
    HPAScaleUpDelay: 3m # You can change the value based on your business requirements.

    Show sample code

    apiVersion: apps/v1
    kind: Deployment
    metadata:
      name: nginx-deployment-basic
      labels:
        app: nginx
    spec:
      replicas: 2
      selector:
        matchLabels:
          app: nginx
        template:
            metadata:
              labels:
                app: nginx
              annotations:
                HPAScaleUpDelay: 3m  # This setting indicates that HPA takes effect 3 minutes after the pods are created. Valid units: s and m. s indicates seconds and m indicates minutes. 
            spec:
              containers:
              - name: nginx
                image: nginx:1.7.9
                ports:
                - containerPort: 80

What do I do if HPA does not scale pods when the scaling threshold is reached?

HPA may not scale pods even if the CPU or memory usage drops below the scale-in threshold or exceeds the scale-out threshold. HPA also takes other factors into consideration when it scales pods. For example, HPA checks whether the current scale-out activity triggers a scale-in activity or the current scale-in activity triggers a scale-out activity. This prevents repetitive scaling and unnecessary resource consumption.

For example, if the scale-out threshold is 80% and you have two pods whose CPU utilizations are both 70%, the pods are not scaled in. This is because the CPU utilization of one pod may be higher than 80% after the pods are scaled in. This triggers another scale-out activity.

How do I configure the metric collection interval of HPA?

For metrics-server versions later than 0.2.1-b46d98c-aliyun, specify the --metric-resolution parameter in the startup settings. Example: --metric-resolution=15s.

Can CronHPA and HPA interact without conflicts? How do I enable CronHPA to interact with HPA?

CronHPA and HPA can interact without conflicts. ACK modifies the CronHPA configurations by setting scaleTargetRef to the scaling object of HPA. This way, only HPA scales the application that is specified by scaleTargetRef. This also enables CronHPA to detect the status of HPA. CronHPA does not directly change the number of pods for the Deployment. CronHPA triggers HPA to scale the pods. This prevents conflicts between CronHPA and HPA. For more information about how to enable CronHPA and HPA to interact without conflicts, see Interaction between CronHPA and HPA.

How do I fix the issue that excess pods are added by HPA when CPU or memory usage rapidly increases?

When the pods of Java applications or applications powered by Java frameworks start, the CPU and memory usage may be high for a few minutes during the warm-up period. This may trigger HPA to scale out the pods. To fix this issue, update the version of metrics-server provided by ACK to 0.3.9.6 or later and add annotations to the pod configurations to prevent HPA from accidentally triggering scaling activities. For more information about how to update metrics-server, see Update the metrics-server component before you update the Kubernetes version to 1.12.

The following YAML template provides the sample pod configurations that prevent HPA from accidentally triggering scaling activities in this scenario.

Show sample code

## In this example, a Deployment is used.
apiVersion: apps/v1
kind: Deployment
metadata:
  name: nginx-deployment-basic
  labels:
    app: nginx
spec:
  replicas: 2
  selector:
    matchLabels:
      app: nginx
  template:
    metadata:
      labels:
        app: nginx
      annotations:
        HPAScaleUpDelay: 3m # This setting indicates that HPA takes effect 3 minutes after the pods are created. Valid units: s and m. s indicates seconds and m indicates minutes. 
    spec:
      containers:
      - name: nginx
        image: nginx:1.7.9 # Replace it with your exactly <image_name:tags>.
        ports:
        - containerPort: 80 

What do I do if HPA scales out an application while the metric value in the audit log is lower than the threshold?

Cause

HPA calculates the desired number of replicated pods based on the ratio of the current metric value to the desired metric value: Desired number of replicated pods = ceil[Current number of replicated pods × (Current metric value/Desired metric value)].

The formula indicates that the accuracy of the result depends on the accuracies of the current number of replicated pods, the current metric value, and the desired metric value. For example, when HPA collects metrics about the current number of replicated pods, HPA first queries the subresource named scale of the object specified by the scaleTargetRef parameter and then selects pods based on the label specified in the Selector field in the status section of the scale subresource. If some pods queried by HPA do not belong to the object specified by the scaleTargetRef parameter, the desired number of replicated pods calculated by HPA may not meet your expectations. For example, HPA may scale out the application while the real-time metric value is lower than the threshold.

The number of matching pods may be inaccurate due to the following reasons:

  • A rolling update is in progress.

  • Pods that do not belong to the object specified by the scaleTargetRef parameter have the label specified in the Selector field in the status section of the scale subresource. Run the following command to query the pods:

    kubectl get pods -n {Namespace} -l {Value of the Selector field in the status section of the subresource named scale}

Solution

  • If a rolling update is in progress, refer to FAQ about node auto scaling to resolve this issue.

  • If pods that do not belong to the object specified by the scaleTargetRef parameter have the label specified in the Selector field in the status section of the scale subresource, locate these pods and then change the label. You can also delete the pods that you no longer require.

Can HPA determine the order in which pods are scaled in?

No, HPA cannot determine the order in which pods are scaled in. HPA can automatically increase or decrease the number of pods based on defined metrics. However, HPA cannot determine which pods are terminated first. The order in which pods are terminated and the graceful shutdown time of the pods are determined by the controller that manages the pods.

What does the unit of the utilization metric collected by HPA mean?

The unit of the utilization metric collected by HPA is m, which stands for the prefix milli-. The prefix means one thousandth. The value of the utilization metric is an integer. For example, if the value of the tcp_connection_counts metric is 70000m, the value is equal to 70.

What do I do if unknown is displayed in the TARGETS column after I run the kubectl get hpa command?

Perform the following operations to troubleshoot the issue:

  1. Run the kubectl describe hpa <hpa_name> command to check why HPA becomes abnormal.

    • If the value of AbleToScale is False in the Conditions field, check whether the Deployment is created as expected.

    • If the value of ScalingActive is False in the Conditions field, proceed to the next step.

  2. Run the kubectl get --raw "/apis/external.metrics.k8s.io/v1beta1/" command. If Error from server (NotFound): the server could not find the requested resource is returned, verify the status of alibaba-cloud-metrics-adapter.

    If the status of alibaba-cloud-metrics-adapter is normal, check whether the HPA metrics are related to the Ingress. If the metrics are related to the Ingress, make sure that you deploy the Simple Log Service component before ack-alibaba-cloud-metrics-adapter is deployed. For more information, see Analyze and monitor the access log of nginx-ingress.

  3. Make sure that the values of the HPA metrics are valid. The value of sls.ingress.route must be in the <namespace>-<svc>-<port> format.

    • namespace: the namespace to which the Ingress belongs.

    • svc: the name of the Service that you selected when you created the Ingress.

    • port: the port of the Service.

How do I find the metrics that are supported by HPA?

For more information about the metrics that are supported by HPA, see Alibaba Cloud metrics adapter. The following table describes the commonly used metrics.

Metric

Description

Additional parameter

sls_ingress_qps

The number of requests that the Ingress can process per second based on a specific routing rule.

sls.ingress.route

sls_alb_ingress_qps

The number of requests that the ALB Ingress can process per second based on a specific routing rule.

sls.ingress.route

sls_ingress_latency_avg

The average latency of all requests.

sls.ingress.route

sls_ingress_latency_p50

The maximum latency for the fastest 50% of all requests.

sls.ingress.route

sls_ingress_latency_p95

The maximum latency for the fastest 95% of all requests.

sls.ingress.route

sls_ingress_latency_p99

The maximum latency for the fastest 99% of all requests.

sls.ingress.route

sls_ingress_latency_p9999

The maximum latency for the fastest 99.99% of all requests.

sls.ingress.route

sls_ingress_inflow

The inbound bandwidth of the Ingress.

sls.ingress.route

How do I configure horizontal autoscaling after I customize the format of NGINX Ingress logs?

Refer to Implement horizontal auto scaling based on Alibaba Cloud metrics to perform horizontal pod autoscaling based on the Ingress metrics that are collected by Simple Log Service. You must configure Simple Log Service to collect NGINX Ingress logs.

  • By default, Simple Log Service is enabled when you create a cluster. If you use the default log collection settings, you can view the log analysis reports and real-time status of NGINX Ingresses in the Simple Log Service console after you create the cluster.

  • If you disable Simple Log Service when you create an ACK cluster, you cannot perform horizontal pod autoscaling based on the Ingress metrics that are collected by Simple Log Service. You must enable Simple Log Service for the cluster before you can use this feature. For more information, see Analyze and monitor the access log of nginx-ingress-controller.

  • The AliyunLogConfig that is generated when you enable Simple Log Service for the cluster for the first time applies only to the default log format that ACK defines for the Ingress controller. If you have changed the log format, you must modify the processor_regex settings in the AliyunLogConfig. For more information, see Use CRDs to collect container logs in DaemonSet mode.