By: Jeremy Pedersen
In the last few years, there has been a lot of hype around technologies such as Docker (the popular container engine), and Kubernetes (the equally popular system often used to manage said containers). Is the hype justified?
A quick search for terms like "cloud-native", or "containerization" might give you the impression that every application can and should be run from containers, preferably on a Kubernetes cluster.
Is this really true? Today, let's pick apart the hype and find some real answers.
Before deciding if the container hype is justified, we need to take a quick walk down memory lane.
In the bad old days, people deployed applications directly on top of a physical server (or virtual machine). This was sometimes accomplished by logging into that server by hand and installing software packages, compiling code, and tweaking configuration files. Sometimes, these steps were carried out by one or more automated tools (Ansible, anyone?).
The upside of this approach is that it is infinitely customizable. You can do almost anything you want. Any code you want to run, any library you want to install: as long as you can coax it to run in your environment, then great! Go nuts.
The downside is obvious: all that coaxing and configuring is a form of "state", that dreaded animal that software engineers are always trying to corral but which always seems to leak out and show up in places it isn't supposed to be.
In this case, what we're talking about is "environmental state" or "configuration state". All those libraries you installed, all those configuration files you edited, all the stuff you touched while you were setting things up. That stuff is "state".
Do you remember exactly what you did? Yes? Are you *sure?
Simply logging into a server (or virtual machine) and configuring it to your tastes is great. As an IT person, this is the ultimate "wild west" sort of freedom. It feels good! Unfortunately, it also makes your servers very hard to replace. Each server is like a beloved family heirloom.
This might be OK for small-scale projects, but not for enterprise-grade application deployments.
When you're building and maintaining software for an "enterprise", you start hearing a lot of words that end in "bility". You know the ones I'm talking about...
...and quite a few more besides.
This is where containers start coming into their own. Done right, containers give you a way to standardize application deployments. We'll use Docker as an example since it's one of the most popular open-source container engines.
If you are following best practices (you are, aren't you?) you'll be creating your containers from a Docker image file. This image file was in turn created by the docker build
command.
The job of the docker build
command is to create a new container image file by following a set of instructions. These instructions are written down in a Dockerfile
.
The Dockerfile
is nothing but a text file with a list of step-by-step instructions in it. Instructions such as:
...and so on. These steps are a recipe for creating a new docker image.
Because you are following this recipe in exactly the same way every time, you can be guaranteed that:
docker build
on your Dockerfile
(so long as you don't change the file, of course).This is very cool stuff! As long as you are careful to include everything in your Dockerfile
and avoid logging into your containers to make changes, then you have successfully trapped all of your "configuration state" and "environment state" in the container image file. Nice!
This all seems wonderful! We have a way to package up our application code, runtime, and configuration into a nice, flat file that we can feed to the Docker engine. Any time we want a new copy of our application, we simply use Docker to fire up a new container from our container image. Cool!
We do still have a couple of problems though. For one thing, "Application state" lives (and dies) in the container. This means we can't do things like run a database in our container. The minute the container is deleted, we've lost our database. Ditto for any other application data that needs to be saved long-term.
So now we need a way for containers to save things somewhere permanent. I guess we'll need to have a way for containers to mount external filesystems. Hmm...
Oh, and our containers need to talk to the Internet, and to each other. I guess I need a virtual network layer. Hmm....
And what about DNS, automatic service discovery, routing, load balancing, health checking, and.....uh oh. Looks like we've got a problem.
This is where Kubernetes comes in and saves the day.
Running containers in production means you are going to have a lot of them. This means you need some tool to manage persistent storage, networking, inter-container communications, failover, scaling, monitoring, and so on. This is exactly what Kubernetes is designed to do. "Kubernetes" comes from Greek, and actually means "pilot" or "helmsman". Kubernetes "pilots" your containers, much like the pilot or helmsman on a modern ship.
Get it? Containers? Container ship? Pilot? Thought so.
Kubernetes is an open-source project but comes from Google's "Borg" container management system, which they have been using in-house since the early 2000s. If there is one thing Google loves, it's adding layers of abstraction. This has its advantages, to be sure, but means Kubernetes can be difficult to learn.
You have to absorb half a dozen new concepts ("Pod", "Persistent Volume", "Deployment", "Service", etc...) before you can even begin using it, and the only way to tell Kubernetes what you want it to do is to write special configuration files which you then pass to the cluster via a command-line tool. Sounds fun, right?
Worse, you quickly discover that deploying anything serious will involve writing half a dozen of these configuration files, which you then need to write, maintain, and update as a group.
It's fair to say that Kubernetes solves the container orchestration problem but it also creates a new set of configuration control issues.
As things stand, we have introduced containers to solve our state and consistency issues, and we've introduced Kubernetes to solve our container management issues.
Now we need some way to manage all the configuration files Kubernetes makes us write. Luckily, Helm comes to the rescue here.
Much like apt
, rpm
, or yum
help you install software packages in Linux, Helm helps you install applications on Kubernetes clusters.
Helm does this using something called Charts, which define all the configuration Helm needs to do to get your Kubernetes cluster to run a given application.
At this point, we've got a toolchain that more or less does everything needed. We can run our containers with Docker. We can manage them with Kubernetes. We can tell Kubernetes what to install using Helm.
We could go further (after all...where are we going to store and version all the Helm Charts?) but I think this is a good stopping point.
This brings us to a critical question: was all this necessary?
Reaching back to the beginning of today's blog post, what was the problem we were trying to solve?
It was simple: we wanted a way to encode an application, its environment, and its configuration so that the application can be reproduced exactly, any time we want.
That was it!
Docker + Kubernetes + (Optionally) Helm certainly does achieve this goal, and does a lot more besides. If you are Facebook or Google or Alibaba, there are a lot of reasons to go with a complex system like this. Especially if your applications are made up of lots of independent little "chunks" that can run in their own separate containers, communicating with each other as needed, over the network (this is often called a "microservices architecture").
But what if that isn't you? Is all this complexity really necessary for a WordPress blog? Is it even necessary for a site that has to scale to 10,000 or 100,000 users? 1,000,000 users? Where do we draw the line? What are the alternatives?
In answer to the question we started with: no, containers are not necessary for every application, nor are they a good fit for every application.. The same goes for Kubernetes.
You may want to consider container technologies like Docker + Kubernetes if:
That's really about it. Unless your application is large enough or complex enough to meet these requirements, there are other ways to work that will end up being a lot simpler for smaller apps!
Ok, so we don't need all the complexity of Docker and Kubernetes to deploy our apps, but we still want to do as much of our configuration as possible before we get our application up and running. What are the alternatives?
What other method can we use to store an application along with its configuration and environment, if not Docker?
The answer is staring us right in the face: the humble disk image. That's right: a regular old VM disk image!
Most virtual machine software (and most cloud providers) support a mechanism for taking a 'snapshot' of a virtual machine's disk. Pair that with some manual (or automated) configuration steps, and we have a very simple way to save our application, its environment, and its configuration in a simple package we can deploy any time we want.
There are good open-source tools that already do this: my favorite is Packer. This tool - made by the same folks that make Terraform - is capable of spinning up a VM on Alibaba Cloud (or any other provider), running configuration steps from a script, shutting the machine down, and then making a disk image.
Pair this with the other features cloud providers make available (the ability to run bootstrapping scripts on VMs at boot time, the ability to create and destroy VMs on demand in response to changes in demand, etc...) and you can easily construct a robust, auto-scaling application architecture that does not rely on containers and doesn't need Kubernetes either. All just regular old VMs and your cloud provider's built-in platform features.
For most if not all simple applications, this is going to be more than adequate.
Better, Packer stores all its configuration information in text files that can be versioned alongside your application's source code, just like you would do with your Dockerfiles
or Kubernetes configuration. Easy!
And that's it for this week. See you next week!
JDP - June 2, 2022
JDP - December 17, 2021
JDP - September 10, 2021
JDP - November 5, 2021
JDP - September 15, 2021
JDP - February 10, 2022
Provides a control plane to allow users to manage Kubernetes clusters that run based on different infrastructure resources
Learn MoreAlibaba Cloud provides beginners and programmers with online course about cloud computing and big data certification including machine learning, Devops, big data analysis and networking.
Learn MoreAlibaba Cloud Container Service for Kubernetes is a fully managed cloud container management service that supports native Kubernetes and integrates with other Alibaba Cloud products.
Learn MoreAlibaba Cloud provides products and services to help you properly plan and execute data backup, massive data archiving, and storage-level disaster recovery.
Learn MoreMore Posts by JDP