By Alwyn Botha, Alibaba Cloud Tech Share Author. Tech Share is Alibaba Cloud's incentive program to encourage the sharing of technical knowledge and best practices within the cloud community.
This tutorial aims to give you practical experience of using Docker container resource limitation functionalities on an Alibaba Cloud Elastic Compute Service (ECS) instance.
In this test, CPU-shares are proportional to other containers. The 1024 default value has no intrinsic meaning.
If all containers have CPU-shares = 4 they all equally share CPU times.
This is identical to all containers have CPU-shares = 1024 they all equally share CPU times.
Run:
docker container run -d --cpu-shares=4 --name mycpu1024a alpine:3.8 /bin/sh -c 'time dd if=/dev/urandom bs=1M count=100 | md5sum'
docker container run -d --cpu-shares=4 --name mycpu1024b alpine:3.8 /bin/sh -c 'time dd if=/dev/urandom bs=1M count=100 | md5sum'
docker container run -d --cpu-shares=4 --name mycpu1024c alpine:3.8 /bin/sh -c 'time dd if=/dev/urandom bs=1M count=100 | md5sum'
Investigate logs:
docker logs mycpu1024a
docker logs mycpu1024b
docker logs mycpu1024c
Prune containers, we are done with them.
docker container prune -f
Note they still all ran the same time. They did not run 4/1024 slower.
cpu-shares are only enforced when CPU cycles are constrained
With no other containers running defining CPU-shares for one container is meaningless.
docker container run -d --cpu-shares=4 --name mycpu1024a alpine:3.8 /bin/sh -c 'time dd if=/dev/urandom bs=1M count=100 | md5sum'
docker logs mycpu1024a
real 0m 12.67s
user 0m 0.00s
sys 0m 12.27s
Now increase shares to 4000 and rerun - see - zero difference in runtime.
One single container is using all available CPU time: no sharing needed.
docker container run -d --cpu-shares=4000 --name mycpu1024a alpine:3.8 /bin/sh -c 'time dd if=/dev/urandom bs=1M count=100 | md5sum'
Prune this one container, we are done with it.
docker container prune -f
Specify how much of all the available CPU resources a container can use. For instance, if the host machine has two CPUs and you set --cpus="1.5", the container is guaranteed at most one and a half of the CPUs.
Note the range of --CPUs values we are using in the commands below. Run it:
docker container run -d --cpus=2 --name mycpu2 alpine:3.8 /bin/sh -c 'time dd if=/dev/urandom bs=1M count=50 | md5sum'
docker container run -d --cpus=1 --name mycpu1 alpine:3.8 /bin/sh -c 'time dd if=/dev/urandom bs=1M count=50 | md5sum'
docker container run -d --cpus=.5 --name mycpu.5 alpine:3.8 /bin/sh -c 'time dd if=/dev/urandom bs=1M count=50 | md5sum'
docker container run -d --cpus=.25 --name mycpu.25 alpine:3.8 /bin/sh -c 'time dd if=/dev/urandom bs=1M count=20 | md5sum'
docker container run -d --cpus=.1 --name mycpu.1 alpine:3.8 /bin/sh -c 'time dd if=/dev/urandom bs=1M count=10 | md5sum'
Investigate docker stats
docker stats
Expected output :
CONTAINER ID NAME CPU % MEM USAGE / LIMIT MEM % NET I/O BLOCK I/O PIDS
843bea7263fb mycpu2 57.69% 1.258MiB / 985.2MiB 0.13% 578B / 0B 1.33MB / 0B 0
186ba15b8258 mycpu1 55.85% 1.25MiB / 985.2MiB 0.13% 578B / 0B 1.33MB / 0B 0
3bcc26eab1ac mycpu.5 46.60% 1.262MiB / 985.2MiB 0.13% 578B / 0B 1.33MB / 0B 0
79d7d7e3c38c mycpu.25 25.43% 1.262MiB / 985.2MiB 0.13% 508B / 0B 1.33MB / 0B 0
b4ba5503a048 mycpu.1 9.76% 1.328MiB / 985.2MiB 0.13% 508B / 0B 1.33MB / 0B 0
mycpu.1, mycpu.25 and mycpu.5 perfectly demonstrate the restrictions applied.
However mycpu1 and mycpu2 does not have 100 + 200 % additional CPUs available. Therefore their settings are ignored and they equal share remaining CPU time.
The --cpus setting defines the number of CPUs a container may use.
For the purposes of Docker and Linux distros CPUs are defined as:
CPUs = Threads per core X cores per socket X sockets
CPUs are not physical CPUs.
Let's investigate my server to determine its number of CPUs.
lscpu | head -n 10
Architecture: x86_64
CPU op-mode(s): 32-bit, 64-bit
Byte Order: Little Endian
CPU(s): 2
On-line CPU(s) list: 0,1
Thread(s) per core: 1
Core(s) per socket: 2
Socket(s): 1
NUMA node(s): 1
Vendor ID: GenuineIntel
Information not needed removed:
lscpu | head -n 10
CPU(s): 2
On-line CPU(s) list: 0,1
Thread(s) per core: 1
Core(s) per socket: 2
Socket(s): 1
CPUs = Threads per core X cores per socket X sockets
CPUs = 1 x 2 x 1 = 2 CPUs
Confirm with:
grep -E 'processor|core id' /proc/cpuinfo
2 core id = 2 cores per socket
2 processors = 2 cpus
processor : 0
core id : 0
processor : 1
core id : 1
Ok this server has 2 CPUs. Your server will be different, so consider that when you investigate all the tests done below.
The --cpus setting defines the number of CPUs a container may use.
Let's use both CPUs, just one, a half and a quarter CPU and record runtimes for CPU-heavy workload.
Note ––cpus=2
docker container run --cpus=2 --name mycpu alpine:3.8 /bin/sh -c 'time dd if=/dev/urandom bs=1M count=30 | md5sum'
Expected output :
real 0m 3.61s
user 0m 0.00s
sys 0m 3.50s
We have nothing to compare against. Let's run the other tests.
docker container prune -f
Note --cpus=1
docker container run --cpus=1 --name mycpu alpine:3.8 /bin/sh -c 'time dd if=/dev/urandom bs=1M count=30 | md5sum'
real 0m 3.54s
user 0m 0.00s
sys 0m 3.37s
docker container prune -f
Note ––cpus=.5
docker container run --cpus=.5 --name mycpu alpine:3.8 /bin/sh -c 'time dd if=/dev/urandom bs=1M count=30 | md5sum'
real 0m 9.97s
user 0m 0.00s
sys 0m 4.78s
docker container prune -f
Note --cpus=.25
docker container run --rm --cpus=.25 --name mycpu alpine:3.8 /bin/sh -c 'time dd if=/dev/urandom bs=1M count=30 | md5sum'
real 0m 19.55s
user 0m 0.00s
sys 0m 4.69s
--cpus=2 realtime: 3.6 sec
--cpus=1 realtime: 3.5 sec
--cpus=.5 realtime: 9.9 sec
--cpus=.25 realtime: 19.5 sec
Our simple benchmark does not effectively use 2 CPUs simultaneously.
Half a CPU runs twice as slow, a quarter CPU runs 4 times slower.
The --CPUs setting works. If the applications inside your containers are unable to multithread / use more than 1 CPU effectively, allocate just one CPU.
Obsolete options. If you use Docker 1.13 or higher, use --cpus instead.
––cpu-period Limit CPU CFS (Completely Fair Scheduler) period
––cpu-quota Limit CPU CFS (Completely Fair Scheduler) quota
Our exercises above clearly show how easy it is to use --CPUs setting.
––cpuset-cpus - CPUs in which to allow execution (0-3, 0,1)
Unfortunately my server only has 2 CPUs and we saw moments ago using more than 1 CPU has no effect ( using THIS SPECIFIC benchmark ).
If your server has several CPUs you can run much more interesting combinations of --cpuset settings. No it will not be useful: THIS SPECIFIC benchmark uses only 1 thread.
Later in this tutorial there are tests using sysbench ( an actual benchmark tool ) that allows you to specify number of threads.
Here are my results: no difference using 2 CPUs, just cpu 1, just CPU 0.
docker container run --rm --cpuset-cpus=0,1 --name mycpu alpine:3.8 /bin/sh -c 'time dd if=/dev/urandom bs=1M count=30 | md5sum'
Expected output :
real 0m 3.44s
user 0m 0.00s
sys 0m 3.35s
docker container run --rm --cpuset-cpus=0 --name mycpu alpine:3.8 /bin/sh -c 'time dd if=/dev/urandom bs=1M count=30 | md5sum'
Expected output :
real 0m 4.15s
user 0m 0.00s
sys 0m 4.00s
docker container run --rm --cpuset-cpus=1 --name mycpu alpine:3.8 /bin/sh -c 'time dd if=/dev/urandom bs=1M count=30 | md5sum'
Expected output :
real 0m 3.40s
user 0m 0.00s
sys 0m 3.28s
All tests done above were done using quick hacks.
To properly test the resource limits of containers we need real Linux bench applications.
I am used to using CentOS so will be using that as the basis of our bench container. Both bench applications are available on Debian / Ubuntu as well. You could easily translate yum installs to apt-get installs and get identical results.
We need to install 2 bench applications in our container. The best way is to build an image with those applications included.
Therefore create a dockerbench directory:
mkdir dockerbench
cd dockerbench
nn Dockerfile
FROM centos:7
RUN set -x \
&& yum -y install https://www.percona.com/redir/downloads/percona-release/redhat/0.0-1/percona-release-0.0-1.x86_64.rpm \
&& yum -y install sysbench \
&& curl http://dl.fedoraproject.org/pub/epel/epel-release-latest-7.noarch.rpm -o epel-release-latest-7.noarch.rpm \
&& rpm -ivh epel-release-latest-7.noarch.rpm \
&& yum -y install stress
The first install adds the percona yum repo - the home of sysbench.
Yum then installs sysbench
The curl adds the EPEL yum repo - the home of stress.
Yum then installs stress
Build our bench image. It will take about a minute - based on Internet yum downloads + yum Dependency Resolution and the usual other activities.
If you do not have CentOS 7 image downloaded already it may take another minute.
docker build --tag centos:bench --file Dockerfile .
Now we have a CentOS bench image ready to for repeated use ( with 2 bench tools installed ).
docker run -it --rm centos:bench /bin/sh
Syntax:
sysbench --threads=2 --events=4 --cpu-max-prime=800500 --verbosity=0 cpu run
Via experiments I determined 800500 to be a nice value to run tests quick enough on my 10 year old computer. CPUmark 700. I added the 5 in there since that many zero digits are difficult to read.
2 CPUs:
docker run -it --rm --name mybench --cpus 2 centos:bench /bin/sh
sh-4.2# time sysbench --threads=2 --events=4 --cpu-max-prime=800500 --verbosity=0 cpu run
real 0m1.952s
user 0m3.803s
sys 0m0.029s
sh-4.2#
sh-4.2# exit
exit
Real is wall clock time - time from start to finish of the sysbench: 1.9 seconds.
User is the amount of CPU time spent in user-mode code (outside the kernel) within sysbench. 2 CPUs were used: each used 1.9 seconds CPU time, so total user time is time added for each CPU.
The elapsed wall clock time is 1.9 seconds. Since 2 CPUs worked simultaneously / concurrently their summarized time is shown as user time.
Sys is the amount of CPU time spent in the kernel doing system calls.
One CPU:
docker run -it --rm --name mybench --cpus 1 centos:bench /bin/sh
sh-4.2# time sysbench --threads=2 --events=4 --cpu-max-prime=800500 --verbosity=0 cpu run
real 0m4.686s
user 0m4.678s
sys 0m0.026s
sh-4.2#
sh-4.2# exit
exit
A more convenient way to run these comparisons is to run the bench command right on the docker run line.
Let's rerun one CPU this way:
docker run -it --rm --name mybench --cpus 1 centos:bench /bin/sh -c 'time sysbench --threads=2 --events=4 --cpu-max-prime=800500 --verbosity=0 cpu run'
real 0m4.659s
user 0m4.649s
sys 0m0.028s
Let's run half a CPU this way:
docker run -it --rm --name mybench --cpus .5 centos:bench /bin/sh -c 'time sysbench --threads=2 --events=4 --cpu-max-prime=800500 --verbosity=0 cpu run'
real 0m10.506s
user 0m5.221s
sys 0m0.035s
Results make perfect sense:
With sysbench in our image it makes such tests very easy and quick. Mere seconds and you now have experience limiting Docker containers CPU usage.
Quite frankly waiting 10.506s for the .5 CPU test is too long - especially if you have a many multicore server.
If you did this on a development server at work the CPU load can change drastically on over an elapsed minute. Developers could have compiled during the 2 second 2 CPU run and the server could be CPU-quiet for the - long 5 seconds - 1 CPU run totally skewing our numbers.
We need to have an approach that is somewhat robust against such changing circumstances. Every test must run as quickly as possible and directly one after the other.
Sounds promising, lets try that. Reduce the max prime number 100 fold.
Cut and paste all 3 these instructions in one go and observe results:
docker run -it --rm --name mybench --cpus 2 centos:bench /bin/sh -c 'time sysbench --threads=2 --events=4 --cpu-max-prime=8005 --verbosity=0 cpu run'
docker run -it --rm --name mybench --cpus .5 centos:bench /bin/sh -c 'time sysbench --threads=2 --events=4 --cpu-max-prime=8005 --verbosity=0 cpu run'
docker run -it --rm --name mybench --cpus 1 centos:bench /bin/sh -c 'time sysbench --threads=2 --events=4 --cpu-max-prime=8005 --verbosity=0 cpu run'
Expected output :
2 CPUs
real 0m0.049s
user 0m0.016s
sys 0m0.021s
1 CPUs
real 0m0.049s
user 0m0.019s
sys 0m0.020s
.5 CPU
real 0m0.051s
user 0m0.020s
sys 0m0.019s
Benchmark startup overhead overwhelms the wall-clock real times. Tests hopelessly too short.
After 3 private experiments decreasing the original workload by 10 fold seems perfect.
docker run -it --rm --name mybench --cpus 2 centos:bench /bin/sh -c 'time sysbench --threads=2 --events=4 --cpu-max-prime=100500 --verbosity=0 cpu run'
docker run -it --rm --name mybench --cpus 1 centos:bench /bin/sh -c 'time sysbench --threads=2 --events=4 --cpu-max-prime=100500 --verbosity=0 cpu run'
docker run -it --rm --name mybench --cpus .5 centos:bench /bin/sh -c 'time sysbench --threads=2 --events=4 --cpu-max-prime=100500 --verbosity=0 cpu run'
( The 5 in there is just to make long strings of 000000 more readable. )
2 CPUs
real 0m0.152s
user 0m0.225s
sys 0m0.015s
1 CPU
real 0m0.277s
user 0m0.279s
sys 0m0.019s
.5 CPU
real 0m0.615s
user 0m0.290s
sys 0m0.024s
Ratios look perfect. Overall runtime is less than a second which minimizes effects of changing CPU-load effects on the development server upon our test timings.
Spend a few minutes playing on your server to get and understanding of what is explained here.
Note I used --rm on the run command. This auto-removes the container after it finishes the command handed to it via /bin/sh.
Serverless vs. Traditional Architecture: What Are the Differences?
2,599 posts | 762 followers
FollowAlibaba Clouder - March 15, 2019
Alibaba Clouder - March 13, 2019
Alibaba Clouder - June 11, 2020
Amber Wang - August 8, 2018
Alibaba Clouder - May 21, 2019
Alibaba Clouder - August 6, 2019
2,599 posts | 762 followers
FollowLearn More
Alibaba Cloud Container Service for Kubernetes is a fully managed cloud container management service that supports native Kubernetes and integrates with other Alibaba Cloud products.
Learn MoreElastic and secure virtual cloud servers to cater all your cloud hosting needs.
Learn MoreMore Posts by Alibaba Clouder