By Alwyn Botha, Alibaba Cloud Community Blog author.
According to the Horizontal Pod Autoscaler page of the Kubernetes Documentation, the Horizontal Pod Autoscaler (or HPA for short) of Kubernetes can be described as follows:
The Horizontal Pod Autoscaler automatically scales the number of pods in a replication controller, deployment or replica set based on observed CPU utilization (or, with custom metrics support, on some other application-provided metrics).
In this tutorial, we are going to focus on the first part of this definition, in particular, auto scaling based on observed CPU utilization because, once you understand the basics, you can take full advantage of the fact that auto scaling can be implemented based on several different metrics related to CPU utilization at the same time, and you will be able to use various software to provide you with additional metrics.
This tutorial is the first part of a two-parts series. You can find the other tutorial in this series here.
In this part of this two part series, specifically, we will cover the Docker build, Kubernetes deployment, and monitoring horizontal pod autoscaler functions using Kubectl.
Auto scaling itself is relatively easy to understand. However, to be able to implement auto scaling well, you'll need the knowledge and know-how of the following operations and topics:
It's important to note that, as a point of reference, this tutorial is made with minikube running locally on Windows 10. Minikube also works on Linux. Of course, as an alternative, you can also use the full-install of Kubernetes. For this tutorial, you'll need one Kubernetes node with at least two cores. For the example in this tutorial, I'll be using a four-core Kubernetes node.
Another important consideration is that the term auto scaling, as far as we are concerned, is unrelated to scaling of a workload across CPUs, which is the job of the operating system. Rather, the term relates to the scaling in the number of Kubernetes Pods. Through auto scaling capabilities, for example, instead of having one Pod using 400% CPU at one time, you can have eight Pods using 50% CPU each. In this tutorial, you will probably only be running the CPU at 25% to 45% usage for around two hours or so. Therefore, you won't benefit much from faster CPUs and more cores in this tutorial.
The metrics server gathers stats every minute by default, and therefore auto scaling only happens every five minutes or so. As such, your workload has to run for several minutes before you'll be able to see the HPA act automatically. Therefore, it's best to run this tutorial on a dedicated server in the cloud. For this, consider using an Alibaba Cloud ECS instance.
In this tutorial, you will complete the following operations:
For the first leg of this tutorial, you'll need to create a custom Docker image that runs Apache and PHP. This Docker image will do CPU-intensive work though a five-line PHP program, which is taken from the Kubernetes website. To start, you'll need to first add the following two files to a temporary working directory.
Dockerfile
:FROM php:5-apache
ADD index.php /var/www/html/index.php
RUN chmod a+rx index.php
index.php
:<?php
$x = 0.0001;
for ($i = 0; $i <= 1000000; $i++) {
$x += sqrt($x);
}
echo "OK!";
?>
Next, you'll want to use docker build
to build your image. It will download Apache and PHP images (which are 360 MB in total) if you do not have these images already. Also, note that HPA-example:latest
is the name of your docker image. You'll need to refer to this image from your worker Pod.
docker build -t HPA-example .
Sending build context to Docker daemon 5.632kB
Step 1/3 : FROM php:5-apache
5-apache: Pulling from library/php
5e6ec7f28fb7: Pull complete
cf165947b5b7: Pull complete
7bd37682846d: Pull complete
99daf8e838e1: Pull complete
ae320713efba: Pull complete
ebcb99c48d8c: Pull complete
9867e71b4ab6: Pull complete
936eb418164a: Pull complete
bc298e7adaf7: Pull complete
ccd61b587bcd: Pull complete
b2d4b347f67c: Pull complete
56e9dde34152: Pull complete
9ad99b17eb78: Pull complete
Digest: sha256:0a40fd273961b99d8afe69a61a68c73c04bc0caa9de384d3b2dd9e7986eec86d
Status: Downloaded newer image for php:5-apache
---> 24c791995c1e
Step 2/3 : ADD index.php /var/www/html/index.php
---> 9ccff8324890
Step 3/3 : RUN chmod a+rx index.php
---> Running in ab82b65295b9
Removing intermediate container ab82b65295b9
---> 8a322330700f
Successfully built 8a322330700f
Successfully tagged HPA-example:latest
Next, make sure that you have this docker image available:
docker images
REPOSITORY TAG IMAGE ID CREATED SIZE
hpa-example latest 8a322330700f 26 seconds ago 355MB
Now that all of this is complete, let's turn to Kubernetes.
Kubernetes cannot automatically scale single Pods. Therefore, you'll need a higher level manager that inherently allows itself to be adjusted. You can learn about one at the Deployments page of the Kubernetes documentation.
The required specifications for your deployment are as follows:
replicas: 1
: You only need one Pod running to start. Your horizontal pod autoscaler (HPA) specifications definition will cause the number of replicas to be automatically scaled based on your own specific requirements.image: HPA-example:latest
: You'll need to use your own custom Docker image with your work-generator PHP program.app: my-hpa-pod
: This is a label. Your services must refer to this label. The service is a front-end to this Pod.requests: cpu: 500m
: You'll need to have such a request definition. This will cause the metrics server to pay particular attention to it, gathering the stats that the HPA also needs to function.Now consider the following:
nano myHPA-Deployment.yaml
apiVersion: apps/v1
kind: Deployment
metadata:
name: my-hpa-deployment
labels:
app: my-hpa-deploy
spec:
replicas: 1
strategy:
type: RollingUpdate
selector:
matchLabels:
app: my-hpa-pod
template:
metadata:
labels:
app: my-hpa-pod
spec:
containers:
- name: my-hpa-container
image: HPA-example:latest
imagePullPolicy: IfNotPresent
resources:
requests:
cpu: 500m
terminationGracePeriodSeconds: 0
Something important to note is selector: app: my-hpa-pod
. This service is a front-end relative to the Pod running in your deployment defined above. It will be accessed through port 80. You can read more about what a Service is here.
nano myService.yaml
kind: Service
apiVersion: v1
metadata:
name: my-service
spec:
selector:
app: my-hpa-pod
ports:
- protocol: TCP
port: 80
targetPort: 8080
Now, you'll want to create your service.
kubectl create -f myService.yaml
service/my-service created
Next, you'll want to list services with the kubectl get
command. For for information, you can read about this command in the here in the Kubernetes documentation. This document provides many useful examples.
kubectl get svc
NAME TYPE CLUSTER-IP EXTERNAL-IP PORT(S) AGE
kubernetes ClusterIP 10.96.0.1 <none> 443/TCP 16d
my-service ClusterIP 10.109.175.151 <none> 80/TCP 4s
Next, create your deployment:
kubectl create -f myHPA-Deployment.yaml
deployment.apps/my-hpa-deployment created
And list deployments with the following command. You'll see your desired Pod running.
kubectl get deploy
NAME DESIRED CURRENT UP-TO-DATE AVAILABLE AGE
my-hpa-deployment 1 1 1 1 8s
Define your Auto scaling requirements as follows:
deployment: my-hpa-deployment
.--cpu-percent=45
because each Pod needs to be set to have a maximum CPU utilization of 45%.--min=1 --max=10
. By doing so, you'll define that the autoscaler may automatically scale the number of Pods in our deployment from 1 to maximally 10 Pods.kubectl autoscale deployment my-hpa-deployment --cpu-percent=45 --min=1 --max=10
horizontalpodautoscaler.Auto scaling/my-hpa-deployment autoscaled
Next, you'll want to determine the status of your HPA:
kubectl get hpa
NAME REFERENCE TARGETS MINPODS MAXPODS REPLICAS AGE
my-hpa-deployment Deployment/my-hpa-deployment <unknown>/45% 1 10 0 8s
In the above, TARGETS
is set to have a maximum CPU utilization of 45%. It will take several minutes for the metrics server values to become available. However, at this stage you have:
Note that, if you run the kubectl get rs
command, you will also see a ReplicaSet
. Your deployment is the ReplicaSet-manager
that gets its instructions from your HPA.
All this is running, so now the only thing you'll need is to know the CPU load on your PHP Pod. For this, you'll need to define the load generator Pod. Below this is just a basic BusyBox Pod that you can exec
into. From there, you will wget
PHP webpages in a bash loop.
nano myLoad-Generator.yaml
apiVersion: v1
kind: Pod
metadata:
name: myloadgenpod
labels:
app: my-loadgen-pod
spec:
containers:
- name: my-loadgen-container
image: busybox
imagePullPolicy: IfNotPresent
command: ['sh', '-c', 'sleep 3600']
restartPolicy: Never
terminationGracePeriodSeconds: 0
Create the load generator Pod.
kubectl create -f myLoad-Generator.yaml
pod/myloadgenpod created
Next, run the kubectl exec
command into loadgenpod. For reference, check out this document. You need to run the following in a separate terminal window. That is because, while the loop is running in the foreground, you can monitor its effects in your original terminal.
kubectl exec -it myloadgenpod -- /bin/sh
/ # cd tmp
/tmp # wget http://172.17.0.7:80
Connecting to 172.17.0.7:80 (172.17.0.7:80)
index.html 100% |************************************************************************| 26 0:00:00 ETA
/tmp # cat index.html
OK
/tmp #
So, now, let's do a quick check to see if your load generator Pod can access your PHP webpage. For this, you'll want to do the following. Use the command cd tmp
, and change to tmp
directory. Then, use wget http://172.17.0.7:80
to get your index.php
load generator page through a service IP address. And then use the index.html
, if it returns OK
, then it works.
Now that you know it' works fine, you can send some workload to index.php
. And enter this at shell:
while true; do wget -q -O- http://172.17.0.7:80; done
The above is endless loop fetching your index.php
page.
Now, go back to your original terminal to investigate horizontal pod autoscaler's functionality based on this workload. Then, if you find your terminal to be very slow, as in your node overwhelmed by this work, you can experiment to lessen the impact on your shell response time.
If you have to, experiment by lessening the load and checking CPU utilization levels with the top
command. Of course, an CPU utilization of around 50% is fine. What it means is that, if you have four cores, as I have in my example, then 4 * 100 = 400 CPU capacity, so 200 usage is absolutely fine.
You should now have one Pod running Apache with five PHP child processes that are processing this workload. Below are the suggested experiment values:
while true; do wget -q -O- http://172.17.0.7:80; sleep .05 ; done
while true; do wget -q -O- http://172.17.0.7:80; sleep .09 ; done
while true; do wget -q -O- http://172.17.0.7:80; sleep .2 ; done
If you're using four cores like me, then no sleep is required to lessen the overall CPU load.
kubectl get hpa
Now, you will monitor HPA functionality with kubectl get hpa
. For this tutorial, you'll want to enter this command every 30 seconds in the same terminal. As seen below, from the results of this command, within a few minutes you can see auto scaling is occurring over time.
kubectl get hpa
NAME REFERENCE TARGETS MINPODS MAXPODS REPLICAS AGE
my-hpa-deployment Deployment/my-hpa-deployment 0%/45% 1 10 1 75s
NAME REFERENCE TARGETS MINPODS MAXPODS REPLICAS AGE
my-hpa-deployment Deployment/my-hpa-deployment 0%/45% 1 10 1 110s
NAME REFERENCE TARGETS MINPODS MAXPODS REPLICAS AGE
my-hpa-deployment Deployment/my-hpa-deployment 40%/45% 1 10 1 3m17s
NAME REFERENCE TARGETS MINPODS MAXPODS REPLICAS AGE
my-hpa-deployment Deployment/my-hpa-deployment 187%/45% 1 10 1 3m31s
NAME REFERENCE TARGETS MINPODS MAXPODS REPLICAS AGE
my-hpa-deployment Deployment/my-hpa-deployment 187%/45% 1 10 4 3m52s
kubectl get hpa
NAME REFERENCE TARGETS MINPODS MAXPODS REPLICAS AGE
my-hpa-deployment Deployment/my-hpa-deployment 186%/45% 1 10 5 4m31s
The metrics server takes three minutes to record that CPU use is more than 0%, and it's not until around three minutes and 30 seconds that it measures CPU use 187% for the one Pod. The horizontal pod autoscaler then decides to scale to four pods the CPU utilization number is above the target you set. Then, following this, 40 seconds later, it is scaled to five pods for the same reason.
Unfortunately, the measured CPU use stays 186% with five Pods. This reason for this is because the system does not immediately divide this number by five since that would require even more CPU usage. Later, we will see this more clearly.
At this point, you can view the horizontal pod autoscaler details for this deployment to see when it scaled up or down and why it did so. In particular, you can use the kubectl describe
command for that:
kubectl describe horizontalpodautoscaler.Auto scaling/my-hpa-deployment
Name: my-hpa-deployment
Namespace: default
Labels: <none>
Annotations: <none>
CreationTimestamp: Thu, 21 Feb 2019 15:37:59 +0200
Reference: Deployment/my-hpa-deployment
Metrics: ( current / target )
resource cpu on pods (as a percentage of request): 186% (933m) / 45%
Min replicas: 1
Max replicas: 10
Deployment pods: 5 current / 5 desired
Conditions:
Type Status Reason Message
---- ------ ------ -------
AbleToScale True ReadyForNewScale recommended size matches current size
ScalingActive True ValidMetricFound the HPA was able to successfully calculate a replica count from cpu resource utilization (percentage of request)
ScalingLimited False DesiredWithinRange the desired count is within the acceptable range
Events:
Type Reason Age From Message
---- ------ ---- ---- -------
Warning FailedGetResourceMetric 3m37s (x4 over 4m22s) horizontal-pod-autoscaler unable to get metrics for resource cpu: no metrics returned from resource metrics API
Warning FailedComputeMetricsReplicas 3m37s (x4 over 4m22s) horizontal-pod-autoscaler failed to get cpu utilization: unable to get metrics for resource cpu: no metrics returned from resource metrics API
Normal SuccessfulRescale 74s horizontal-pod-autoscaler New size: 4; reason: cpu resource utilization (percentage of request) above target
Normal SuccessfulRescale 59s horizontal-pod-autoscaler New size: 5; reason:
Now, let's discuss the conditions and events seen above.
AbleToScale True ReadyForNewScale
: This indicates whether or not the HPA is able to fetch and update scales, as well as whether or not any backoff-related conditions would prevent scalingScalingActive True ValidMetricFound
: In this condition, ScalingActive
indicates whether or not the horizontal pod autoscaler is enabled, or more specifically if the replica count of the target is not zero, and is able to calculate desired scales. When ScalingActive
is False
, it usually means that there exist problems in fetching the corresponding metrics.ScalingLimited False
: With this condition, you have minimum replica number of 1 and a maximum of 10. The current number of Pods is 5. This deployment is able to scale up or down within that range. Scaling is not limited.Of the events seen above. The first event, Warning FailedGetResourceMetric
is abnormal. It indicates that metrics server is still busy gathering data and did not send any data to the HPA to use. However, the last two events are normal. It shows autoscaler in action, and also what it did and why.
You still need to see the CPU use per Pod reduce. To this end, if you continue to monitor get HPA
, you'll see that happen after a few minutes. In fact, after five minutes and 32 seconds, the CPU use for one Pod is at 37%, and it continues to hover around that value since workload is a steady stream of Wgets.
kubectl get hpa
NAME REFERENCE TARGETS MINPODS MAXPODS REPLICAS AGE
my-hpa-deployment Deployment/my-hpa-deployment 186%/45% 1 10 5 5m6s
NAME REFERENCE TARGETS MINPODS MAXPODS REPLICAS AGE
my-hpa-deployment Deployment/my-hpa-deployment 37%/45% 1 10 5 5m32s
NAME REFERENCE TARGETS MINPODS MAXPODS REPLICAS AGE
my-hpa-deployment Deployment/my-hpa-deployment 36%/45% 1 10 5 6m33s
NAME REFERENCE TARGETS MINPODS MAXPODS REPLICAS AGE
my-hpa-deployment Deployment/my-hpa-deployment 36%/45% 1 10 5 6m45s
Jump Start Your Digital Transformation with a 12-Month ECS Free Trial
Taking Full Advantage of Horizontal Pod Autoscaler in Kubernetes (Continued)
2,599 posts | 764 followers
FollowAlibaba Clouder - December 23, 2019
Alibaba Cloud Storage - June 4, 2019
Alibaba Cloud Native - June 9, 2022
Alibaba Cloud Blockchain Service Team - December 26, 2018
Alibaba Cloud Native Community - July 6, 2022
Alibaba Container Service - July 16, 2019
2,599 posts | 764 followers
FollowAlibaba Cloud Container Service for Kubernetes is a fully managed cloud container management service that supports native Kubernetes and integrates with other Alibaba Cloud products.
Learn MoreA secure image hosting platform providing containerized image lifecycle management
Learn MoreElastic and secure virtual cloud servers to cater all your cloud hosting needs.
Learn MoreMore Posts by Alibaba Clouder