By Alwyn Botha, Alibaba Cloud Community Blog author.
This tutorial is the second part of a two-parts series. You can find the other tutorial in this series here.
In this part of this two part series, we will cover how down scaling works with Kubernetes's horizontal pod autoscaler, then consider the overall sequence of events for auto scaling. After this, this article will also consider an example, go into the details of how cleanup works, discuss the related algorithm and look at what exactly downscale delay is and how you can configure it.
So far in this two-part tutorial, we saw that auto scaling in an upward fashion works well. So, now it's time to see how it works in the other direction. Theoretically, if we reduce the work, the process of auto scaling will automatically scale down as well.
To test this theory, you'll want to switch to terminal running the wget
loop and press Control
C
to stop it. With this, the workload on the deployment will change to zero. And by doing this, over time the horizontal pod autoscaler should also reduce the number of Pods.
In reality, the horizontal pod autoscaler does support down scaling, having support for coolday/delay. For information about this, check out the Support for cooldown/delay page in the Kubernetes documentation.
Our results, shown below, confirm this. It took around five minutes for the down scaling to occur.
kubectl get hpa
NAME REFERENCE TARGETS MINPODS MAXPODS REPLICAS AGE
my-hpa-deployment Deployment/my-hpa-deployment 37%/45% 1 10 5 7m10s
Thu Feb 21 15:45:09 SAST 2019
kubectl get hpa
NAME REFERENCE TARGETS MINPODS MAXPODS REPLICAS AGE
my-hpa-deployment Deployment/my-hpa-deployment 37%/45% 1 10 5 7m43s
NAME REFERENCE TARGETS MINPODS MAXPODS REPLICAS AGE
my-hpa-deployment Deployment/my-hpa-deployment 1%/45% 1 10 5 8m44s
NAME REFERENCE TARGETS MINPODS MAXPODS REPLICAS AGE
my-hpa-deployment Deployment/my-hpa-deployment 0%/45% 1 10 5 9m20s
NAME REFERENCE TARGETS MINPODS MAXPODS REPLICAS AGE
my-hpa-deployment Deployment/my-hpa-deployment 0%/45% 1 10 5 10m
NAME REFERENCE TARGETS MINPODS MAXPODS REPLICAS AGE
my-hpa-deployment Deployment/my-hpa-deployment 0%/45% 1 10 5 12m
NAME REFERENCE TARGETS MINPODS MAXPODS REPLICAS AGE
my-hpa-deployment Deployment/my-hpa-deployment 0%/45% 1 10 1 13m
To summarize the above output, it took three minutes for the metrics server to measure that overall deployment CPU load to be at 0%, which you can see in the TARGET
column. And REPLICAS
stayed at 5 even when the CPU at 0%. Then, at eight minutes and 44 seconds, the CPU use was at 1%, and at 13 minutes REPLICAS
scaled to one.
According to this page about the horizontal pod autocaler on GitHub:
Autoscaler works in a conservative way. If a new user load appears, it is important for us to rapidly increase the number of pods, so that user requests will not be rejected. In other words, lowering the number of pods is not that urgent.
Starting and stopping pods may introduce noise to the metric (for instance, starting may temporarily increase CPU). So, after each action, the autoscaler should wait some time for reliable data.
Scale-up can only happen if there was no rescaling within the last 3 minutes.
Scale-down will wait for 5 minutes from the last rescaling.
You will see evidence for the above description throughout this tutorial. The output of the kubectl describe
command, in particular the last line, explains it perfectly:
horizontal-pod-autoscaler New size: 1; reason: All metrics below target
Below is what the entire output looks like:
kubectl describe horizontalpodautoscaler.Auto scaling/my-hpa-deployment
Name: my-hpa-deployment
Metrics: ( current / target )
resource cpu on pods (as a percentage of request): 0% (0) / 45%
Min replicas: 1
Max replicas: 10
Deployment pods: 1 current / 1 desired
Conditions:
Type Status Reason Message
---- ------ ------ -------
AbleToScale True ScaleDownStabilized recent recommendations were higher than current one, applying the highest recent recommendation
ScalingActive True ValidMetricFound the HPA was able to successfully calculate a replica count from cpu resource utilization (percentage of request)
ScalingLimited False DesiredWithinRange the desired count is within the acceptable range
Events:
Type Reason Age From Message
---- ------ ---- ---- -------
Normal SuccessfulRescale 10m horizontal-pod-autoscaler New size: 4; reason: cpu resource utilization (percentage of request) above target
Normal SuccessfulRescale 10m horizontal-pod-autoscaler New size: 5; reason:
Normal SuccessfulRescale 48s horizontal-pod-autoscaler New size: 1; reason: All metrics below target
Consider the conditions
output above. ScaleDownStabilized
was outputted, as as expected. Again, remember that our auto scale definition is:
kubectl autoscale deployment my-hpa-deployment --cpu-percent=45 --min=1 --max=10
Now, given that min = 1 Pod
, this line in the output makes sense:
Deployment pods: 1 current / 1 desired
On the other hand, if our auto scale definition was:
kubectl autoscale deployment my-hpa-deployment --cpu-percent=45 --min=3 --max=10
Then, given that min = 3 Pods
, we can expect the line to be:
Deployment pods: 3 current / 1 desired
Based on this, the horizontal pod autoscaler won't scale below specified minimum of 3. Therefore, following this, we would expect ScalingLimited
to be True
in that case because scaling is limited, and it cannot scale down to desired Pod count of one.
The final HPA status output is as follows:
kubectl get hpa
NAME REFERENCE TARGETS MINPODS MAXPODS REPLICAS AGE
my-hpa-deployment Deployment/my-hpa-deployment 0%/45% 1 10 1 15m
Within a running time of 15 minutes, we saw auto scaling, both in the up and down directions in action. This is the current state of running Apache child processes. Six of the children processes are idle.
PID USER PR NI VIRT RES %CPU %MEM TIME+ S COMMAND
5771 www-data 20 0 217.8m 10.9m 0.0 0.5 0:49.64 S apache2 -DFOREGROUND
5772 www-data 20 0 217.8m 10.9m 0.0 0.5 0:49.48 S apache2 -DFOREGROUND
5773 www-data 20 0 217.8m 10.9m 0.0 0.5 0:49.84 S apache2 -DFOREGROUND
5774 www-data 20 0 217.8m 10.9m 0.0 0.5 0:49.61 S apache2 -DFOREGROUND
5775 www-data 20 0 217.8m 10.9m 0.0 0.5 0:49.75 S apache2 -DFOREGROUND
6937 www-data 20 0 217.8m 10.9m 0.0 0.5 0:48.63 S apache2 -DFOREGROUND
In the first part of this tutorial, we only used kubectl describe horizontalpodautoscaler.Auto scaling/my-hpa-deployment
to obtain the status information of our horizontal pod autoscaler.
An alternative is used below is kubectl describe deployment.extensions/my-hpa-deployment
. You will see this command provides much more detailed information about the status of our horizontal pod autoscaler. However, the command below provides no new insights.
kubectl describe deployment.extensions/my-hpa-deployment
Name: my-hpa-deployment
CreationTimestamp: Thu, 21 Feb 2019 15:37:52 +0200
Labels: app=my-hpa-deploy
Selector: app=my-hpa-pod
Replicas: 1 desired | 1 updated | 1 total | 1 available | 0 unavailable
StrategyType: RollingUpdate
RollingUpdateStrategy: 25% max unavailable, 25% max surge
Pod Template:
Labels: app=my-hpa-pod
Containers:
my-hpa-container:
Image: HPA-example:latest
Requests:
cpu: 500m
Conditions:
Type Status Reason
---- ------ ------
Progressing True NewReplicaSetAvailable
Available True MinimumReplicasAvailable
NewReplicaSet: my-hpa-deployment-78d4586d7f (1/1 replicas created)
Events:
Type Reason Age From Message
---- ------ ---- ---- -------
Normal ScalingReplicaSet 15m deployment-controller Scaled up replica set my-hpa-deployment-78d4586d7f to 1
Normal ScalingReplicaSet 12m deployment-controller Scaled up replica set my-hpa-deployment-78d4586d7f to 4
Normal ScalingReplicaSet 12m deployment-controller Scaled up replica set my-hpa-deployment-78d4586d7f to 5
Normal ScalingReplicaSet 2m52s deployment-controller Scaled down replica set my-hpa-deployment-78d4586d7f to 1
A beginner to all of this may think that the overall sequence of events for auto scaling is as follows.
However, this is incorrect. A more accurate correct description, as described in this document, is as follows:
So, given these facts, deployment does not signal ReplicaSet
. Rather, ReplicaSet
control loop watches the shared state of the cluster through the API server and up or down scales as needed based on the information in the API database, or rather the etcd data store.
Following this, the horizontal pod autoscaler doesn't send these required replicas to the deployment. Rather, HPA only updates API data store with a calculated number of replicas. Then, the deployment control loop watches API data store for changes to the number of required replicas.
This ten sentence description may be adequate to explain to beginners how ten different Kubernetes components interact to make horizontal pod auto scaling possible. Later, beginners may want to read a more in-depth description of how Kubernetes internally controls all its components: reading shared object state in the API data store and makes changes attempting to move the current state towards the desired state.
Note that, for this tutorial in particular, I am running this on a single node, and auto scaling, for us, does nothing in terms of lessening the overall CPU load on a single node. Rather, a single node is auto scaled to many more pods where each pod receives less stress on its CPU.
Therefore, following this, the purpose of horizontal pod auto scaling is to calculate number of replicas for our desired CPU load percentage, so that the Kubernetes scheduler distributes pods among several different hardware nodes. Horizontal pod auto scaling helps distribute CPU load among Pods on several different nodes.
Note now that, if you run kubectl get rs
, you will also see a ReplicaSet
. Our deployment is the ReplicaSet-manager
that gets its instructions from our horizontal pod autoscaler. I have, for the most part, ignored the ReplicaSet
in this tutorial. But it's important to note that it still plays a crucial role. However, it is more difficult to understand, and so we have focused only on HPA commands and their output in this two-part tutorial series.
Now, that our first auto scaling demo is all finished. You'll want to delete the deployment with the following command:
kubectl delete -f myHPA-Deployment.yaml
deployment.apps "my-hpa-deployment" deleted
Note that deleting the deployment deletes the deployment object, and the related API information structure, the ReplicaSet
and all the pod replicas.
In the previous example covered in this tutorial, our deployment had to scale from one to five pods so to be able to use the CPU percentage we specified.
Now, in this example, we will start our deployment with three pods at the beginning. I already secretly ran this. My expectation was that our deployment to scale quicker to five replicas since fewer auto scale adjustments are needed in this scenario.
So, let's see what happens together. To start things out, make just one change to you deployment below. Set replicas
to 3
.
nano myHPA-Deployment.yaml
apiVersion: apps/v1
kind: Deployment
metadata:
name: my-hpa-deployment
labels:
app: my-hpa-deploy
spec:
replicas: 3
strategy:
type: RollingUpdate
selector:
matchLabels:
app: my-hpa-pod
template:
metadata:
labels:
app: my-hpa-pod
spec:
containers:
- name: my-hpa-container
image: HPA-example:latest
imagePullPolicy: IfNotPresent
resources:
requests:
cpu: 500m
terminationGracePeriodSeconds: 0
Now, create this three replicas deployment with the following command:
kubectl create -f myHPA-Deployment.yaml
deployment.apps/my-hpa-deployment created
Next, you'll need to generate a load using your second terminal again. Note that, if you exited your exec
command, you'll need to exec
into the load generator Pod again, using the following command.
kubectl exec -it myloadgenpod -- /bin/sh
Enter this at shell again:
while true; do wget -q -O- http://172.17.0.7:80; done
Next, you'll want to press the up arrow, as you do normally at the shell, for previous command works quicker. While the loop is running in the foreground, you can monitor its effects in your original terminal.
Just as before, monitor by running the kubectl get hpa
command every 30 seconds or so. The following is the output:
kubectl get hpa
NAME REFERENCE TARGETS MINPODS MAXPODS REPLICAS AGE
my-hpa-deployment Deployment/my-hpa-deployment <unknown>/45% 1 10 3 23s
NAME REFERENCE TARGETS MINPODS MAXPODS REPLICAS AGE
my-hpa-deployment Deployment/my-hpa-deployment <unknown>/45% 1 10 3 55s
NAME REFERENCE TARGETS MINPODS MAXPODS REPLICAS AGE
my-hpa-deployment Deployment/my-hpa-deployment 0%/45% 1 10 3 65s
NAME REFERENCE TARGETS MINPODS MAXPODS REPLICAS AGE
my-hpa-deployment Deployment/my-hpa-deployment 0%/45% 1 10 3 108s
kubectl get hpa
NAME REFERENCE TARGETS MINPODS MAXPODS REPLICAS AGE
my-hpa-deployment Deployment/my-hpa-deployment 48%/45% 1 10 3 2m
NAME REFERENCE TARGETS MINPODS MAXPODS REPLICAS AGE
my-hpa-deployment Deployment/my-hpa-deployment 48%/45% 1 10 3 2m46s
During the first three minutes, the replicas number stayed at three. Moreover, the metrics server took one minute to transition from unknown
CPU use to 0%
.
In reality, the 0%
is wrong, as it is not up-to-date with what is actually going on, with actual CPU utilization for each pod averaging around 48%. The metrics server will take another minute to update its CPU measure to be the accurate and current 48 percent.
Therefore, one important lesson for this is that we cannot evaluate HPA functionality from just observing one output measurement. Rather, we need to consider what is going on over time. On that note, at the end of this tutorial, I will explain how to let the metrics server update quicker.
Now, let's continue on by investigating a detailed status for the deployment. We'll see that just two seconds ago it did scale up from three to five replicas. In particular, consider this line of the output:
Replicas: 5 desired | 5 updated | 5 total | 3 available | 2 unavailable
To explain the above output, the deployment requires five replicas and has five replicas available to it, but there are only three of them which available at this particular second. This is because the last 2 replicas are busy starting up, hence why we see that they are only 2 unavailable
.
kubectl describe deployment.extensions/my-hpa-deployment
Name: my-hpa-deployment
Labels: app=my-hpa-deploy
Selector: app=my-hpa-pod
Replicas: 5 desired | 5 updated | 5 total | 3 available | 2 unavailable
RollingUpdateStrategy: 25% max unavailable, 25% max surge
Pod Template:
Labels: app=my-hpa-pod
Containers:
my-hpa-container:
Image: HPA-example:latest
Requests:
cpu: 500m
Conditions:
Type Status Reason
---- ------ ------
Progressing True NewReplicaSetAvailable
Available False MinimumReplicasUnavailable
NewReplicaSet: my-hpa-deployment-78d4586d7f (5/5 replicas created)
Events:
Type Reason Age From Message
---- ------ ---- ---- -------
Normal ScalingReplicaSet 3m23s deployment-controller Scaled up replica set my-hpa-deployment-78d4586d7f to 3
Normal ScalingReplicaSet 2s deployment-controller Scaled up replica set my-hpa-deployment-78d4586d7f to 5
We see below that the horizontal pod autoscaler did scale up to five replicas:
kubectl describe horizontalpodautoscaler.Auto scaling/my-hpa-deployment
Name: my-hpa-deployment
Metrics: ( current / target )
resource cpu on pods (as a percentage of request): 62% (311m) / 45%
Min replicas: 1
Max replicas: 10
Deployment pods: 3 current / 5 desired
Conditions:
Type Status Reason Message
---- ------ ------ -------
AbleToScale True SucceededRescale the HPA controller was able to update the target scale to 5
ScalingActive True ValidMetricFound the HPA was able to successfully calculate a replica count from cpu resource utilization (percentage of request)
ScalingLimited False DesiredWithinRange the desired count is within the acceptable range
Events:
Type Reason Age From Message
---- ------ ---- ---- -------
Warning FailedGetResourceMetric 2m22s (x3 over 2m52s) horizontal-pod-autoscaler unable to get metrics for resource cpu: no metrics returned from resource metrics API
Warning FailedComputeMetricsReplicas 2m22s (x3 over 2m52s) horizontal-pod-autoscaler failed to get cpu utilization: unable to get metrics for resource cpu: no metrics returned from resource metrics API
Normal SuccessfulRescale 15s horizontal-pod-autoscaler New size: 5; reason: cpu resource utilization (percentage of request) above target
Below we confirm that the replicas now is five. Note, though, that it took from three minutes and 18 seconds to five minutes and 21 second for the CPU utilization percentage to drop from the 60% to the correct 35% percent for each pod.
kubectl get hpa
NAME REFERENCE TARGETS MINPODS MAXPODS REPLICAS AGE
my-hpa-deployment Deployment/my-hpa-deployment 62%/45% 1 10 5 3m18s
NAME REFERENCE TARGETS MINPODS MAXPODS REPLICAS AGE
my-hpa-deployment Deployment/my-hpa-deployment 59%/45% 1 10 5 4m4s
NAME REFERENCE TARGETS MINPODS MAXPODS REPLICAS AGE
my-hpa-deployment Deployment/my-hpa-deployment 59%/45% 1 10 5 4m46s
NAME REFERENCE TARGETS MINPODS MAXPODS REPLICAS AGE
my-hpa-deployment Deployment/my-hpa-deployment 35%/45% 1 10 5 5m21s
NAME REFERENCE TARGETS MINPODS MAXPODS REPLICAS AGE
my-hpa-deployment Deployment/my-hpa-deployment 36%/45% 1 10 5 6m13s
Note that the Conditions:
output below is a bit confusing:
kubectl describe horizontalpodautoscaler.Auto scaling/my-hpa-deployment
Name: my-hpa-deployment
Namespace: default
CreationTimestamp: Thu, 21 Feb 2019 16:10:15 +0200
Reference: Deployment/my-hpa-deployment
Metrics: ( current / target )
resource cpu on pods (as a percentage of request): 36% (180m) / 45%
Min replicas: 1
Max replicas: 10
Deployment pods: 5 current / 5 desired
Conditions:
Type Status Reason Message
---- ------ ------ -------
AbleToScale True ScaleDownStabilized recent recommendations were higher than current one, applying the highest recent recommendation
ScalingActive True ValidMetricFound the HPA was able to successfully calculate a replica count from cpu resource utilization (percentage of request)
ScalingLimited False DesiredWithinRange the desired count is within the acceptable range
Events:
Type Reason Age From Message
---- ------ ---- ---- -------
Warning FailedGetResourceMetric 5m33s (x3 over 6m3s) horizontal-pod-autoscaler unable to get metrics for resource cpu: no metrics returned from resource metrics API
Warning FailedComputeMetricsReplicas 5m33s (x3 over 6m3s) horizontal-pod-autoscaler failed to get cpu utilization: unable to get metrics for resource cpu: no metrics returned from resource metrics API
Normal SuccessfulRescale 3m26s horizontal-pod-autoscaler New size: 5; reason: cpu resource utilization (percentage of request) above target
From this output, we can see that the horizontal pod autoscaler scaled up from three to five, which would make us think, then, that the output should be Scale UP Stabilized
. However, despite this contradiction, the longer prose description, shown below, is more accurate.
AbleToScale True ScaleDownStabilized recent recommendations were higher than current one, applying the highest recent recommendation
Remember that our previous example demonstrated the process of scaling down to one pod. Now it's time to demonstrate auto scaling down with just one, two or three pods. For this, of course, what is needed is a slightly lower workload.
Interestingly, for this, inserting a sleep .05
is absolutely perfect for this demo on my four-core server. Following this, go to your second terminal. Stop the while loop by pressing Control
C
. Then, enter the command below.
while true; do wget -q -O- http://172.17.0.7:80; sleep .05 ;done
The metrics server should in a few minutes determine a slightly lesser workload and automatically scale the pods downward. To show this, you'll want to monitor every 30 seconds. The output is as follows:
kubectl get hpa
NAME REFERENCE TARGETS MINPODS MAXPODS REPLICAS AGE
my-hpa-deployment Deployment/my-hpa-deployment 17%/45% 1 10 5 10m
NAME REFERENCE TARGETS MINPODS MAXPODS REPLICAS AGE
my-hpa-deployment Deployment/my-hpa-deployment 19%/45% 1 10 5 14m
NAME REFERENCE TARGETS MINPODS MAXPODS REPLICAS AGE
my-hpa-deployment Deployment/my-hpa-deployment 31%/45% 1 10 3 15m
NAME REFERENCE TARGETS MINPODS MAXPODS REPLICAS AGE
my-hpa-deployment Deployment/my-hpa-deployment 31%/45% 1 10 3 15m
As expected replicas are scaled downwards from five to three.
kubectl describe horizontalpodautoscaler.Auto scaling/my-hpa-deployment
Name: my-hpa-deployment
Metrics: ( current / target )
resource cpu on pods (as a percentage of request): 31% (157m) / 45%
Min replicas: 1
Max replicas: 10
Deployment pods: 3 current / 3 desired
Conditions:
Type Status Reason Message
---- ------ ------ -------
AbleToScale True ReadyForNewScale recommended size matches current size
ScalingActive True ValidMetricFound the HPA was able to successfully calculate a replica count from cpu resource utilization (percentage of request)
ScalingLimited False DesiredWithinRange the desired count is within the acceptable range
Events:
Type Reason Age From Message
---- ------ ---- ---- -------
Normal SuccessfulRescale 12m horizontal-pod-autoscaler New size: 5; reason: cpu resource utilization (percentage of request) above target
Normal SuccessfulRescale 69s horizontal-pod-autoscaler New size: 3; reason: All metrics below target
kubectl get hpa
NAME REFERENCE TARGETS MINPODS MAXPODS REPLICAS AGE
my-hpa-deployment Deployment/my-hpa-deployment 32%/45% 1 10 3 16m
This output proves the horizontal pod autoscaler can automatically scale upwards and download when reacting to changing workloads.
Now, you'll want to remove all workloads on your service. To do this, go to the second terminal and press Control
C
to break out of the loop. Within five minutes, horizontal pod autoscaler will scale the pod number from three down to just one. During this process, it will first scale down to two, then to one.
kubectl get hpa
NAME REFERENCE TARGETS MINPODS MAXPODS REPLICAS AGE
my-hpa-deployment Deployment/my-hpa-deployment 27%/45% 1 10 3 17m
NAME REFERENCE TARGETS MINPODS MAXPODS REPLICAS AGE
my-hpa-deployment Deployment/my-hpa-deployment 0%/45% 1 10 3 18m
NAME REFERENCE TARGETS MINPODS MAXPODS REPLICAS AGE
my-hpa-deployment Deployment/my-hpa-deployment 0%/45% 1 10 3 20m
NAME REFERENCE TARGETS MINPODS MAXPODS REPLICAS AGE
my-hpa-deployment Deployment/my-hpa-deployment 0%/45% 1 10 3 21m
NAME REFERENCE TARGETS MINPODS MAXPODS REPLICAS AGE
my-hpa-deployment Deployment/my-hpa-deployment 0%/45% 1 10 2 22m
Now, let's investigate more detail:
kubectl describe horizontalpodautoscaler.Auto scaling/my-hpa-deployment
Name: my-hpa-deployment
Namespace: default
Labels: <none>
Annotations: <none>
CreationTimestamp: Thu, 21 Feb 2019 16:10:15 +0200
Reference: Deployment/my-hpa-deployment
Metrics: ( current / target )
resource cpu on pods (as a percentage of request): 0% (0) / 45%
Min replicas: 1
Max replicas: 10
Deployment pods: 2 current / 2 desired
Conditions:
Type Status Reason Message
---- ------ ------ -------
AbleToScale True ScaleDownStabilized recent recommendations were higher than current one, applying the highest recent recommendation
ScalingActive True ValidMetricFound the HPA was able to successfully calculate a replica count from cpu resource utilization (percentage of request)
ScalingLimited False DesiredWithinRange the desired count is within the acceptable range
Events:
Type Reason Age From Message
---- ------ ---- ---- -------
Warning FailedGetResourceMetric 21m (x3 over 22m) horizontal-pod-autoscaler unable to get metrics for resource cpu: no metrics returned from resource metrics API
Warning FailedComputeMetricsReplicas 21m (x3 over 22m) horizontal-pod-autoscaler failed to get cpu utilization: unable to get metrics for resource cpu: no metrics returned from resource metrics API
Normal SuccessfulRescale 19m horizontal-pod-autoscaler New size: 5; reason: cpu resource utilization (percentage of request) above target
Normal SuccessfulRescale 7m44s horizontal-pod-autoscaler New size: 3; reason: All metrics below target
Normal SuccessfulRescale 44s horizontal-pod-autoscaler New size: 2; reason: All metrics below target
The current status of your horizontal pod autoscaler deployment is as follows:
kubectl get hpa
NAME REFERENCE TARGETS MINPODS MAXPODS REPLICAS AGE
my-hpa-deployment Deployment/my-hpa-deployment 0%/45% 1 10 1 24m
And, the final output for your horizontal pod autoscaler is as follows:
kubectl describe horizontalpodautoscaler.Auto scaling/my-hpa-deployment
Name: my-hpa-deployment
Reference: Deployment/my-hpa-deployment
Events:
Type Reason Age From Message
---- ------ ---- ---- -------
Normal SuccessfulRescale 96s horizontal-pod-autoscaler New size: 1; reason: All metrics below target
Now, let's perform some clean up by deleting all of the unnecessary stuff left over from our demonstration above. First, you'll want to delete load generator Pod using the following command:
kubectl delete pod/myloadgenpod
pod "myloadgenpod" deleted
Next, you can delete the horizontal pod autoscaler:
kubectl delete horizontalpodautoscaler.Auto scaling/my-hpa-deployment
horizontalpodautoscaler.Auto scaling "my-hpa-deployment" deleted
Deleting your horizontal pod autoscaler only deletes your API definition structure involved with the horizontal pod autoscaler, such as the minimum and maximum pod numbers, and maximum CPU utilization percentage allowed. However, deleting HPA does not delete your deployments or Pods. Therefore, you'll also want to delete those separately.
First, let's delete deployment:
kubectl delete -f myHPA-Deployment.yaml
deployment.apps "my-hpa-deployment" deleted
This will delete deployment, replicaset and all its pods. Then, delete the service with the following command:
kubectl delete svc/my-service
service "my-service" deleted
Last, if you do not need the 3Docker PHP and Apache image, you may delete it as well with the following command:
docker rmi HPA-example:latest
In reality, the exercise that we went through in the above demonstrations in this tutorial series shows us exactly how auto scaling algorithm works. Consider the following excerpt from this algorithm details page of the Kubernetes documentation.
...if the current metric value is 200m, and the desired value is 100m, the number of replicas will be doubled, since 200.0 / 100.0 == 2.0
If the current value is instead 50m, we'll halve the number of replicas, since 50.0 / 100.0 == 0.5.
We'll skip scaling if the ratio is sufficiently close to 1.0 (within a globally-configurable tolerance, from the --horizontal-pod-autoscaler-tolerance flag, which defaults to 0.1).
... the currentMetricValue is computed by taking the average of the given metric across all Pods in the HorizontalPodAutoscaler's scale target.
Now, let's discuss the delay that we noticed throughout this tutorial. As we saw, the HPA takes five minutes before down scaling the number of replicas. In reality, this can be changed, as this number represents the default setting. You can reduce this time with --horizontal-pod-autoscaler-downscale-delay
.
As a point of reference, consider the information presented in the document Support for cooldown/delay.
Before you implement any changes be aware of the consequences. The below consequences were taken from the above document:
- When tuning these parameter values, a cluster operator should be aware of the possible consequences.
- If the delay (cooldown) value is set too long, there could be complaints that the Horizontal Pod Autoscaler is not responsive to workload changes.
- However, if the delay value is set too short, the scale of the replicas set may keep thrashing as usual.
Taking Full Advantage of the Horizontal Pod Autoscaler in Kubernetes
See How ApsaraVideo Helped Make Record-Breaking Sales during Lazada's Super Show and Live Shopping
2,599 posts | 762 followers
FollowAlibaba Clouder - December 23, 2019
Alibaba Cloud Storage - June 4, 2019
Alibaba Cloud Native - June 9, 2022
Alibaba Cloud Blockchain Service Team - December 26, 2018
Alibaba Cloud Native Community - July 6, 2022
Alibaba Container Service - July 16, 2019
2,599 posts | 762 followers
FollowAlibaba Cloud Container Service for Kubernetes is a fully managed cloud container management service that supports native Kubernetes and integrates with other Alibaba Cloud products.
Learn MoreA secure image hosting platform providing containerized image lifecycle management
Learn MoreElastic and secure virtual cloud servers to cater all your cloud hosting needs.
Learn MoreMore Posts by Alibaba Clouder