Docker, what is it?

A quick explanation of Docker for beginners. Why use it? How does it work? Where to start? 2020-05-11

Find more tutorials explaining the basics of a subject in ten minutes with the tag what is it.

Docker is a free tool allowing to run software in an isolated environment. In other words, you can use it for installing and running software without polluting your own machine: Docker will create a brand new workspace, simulating as if you were on another machine.

The isolated box executing the software and its dependencies is called a container. While these two concepts are close, a container is absolutely not a virtual machine! As a newbie, the virtual machine might however be the closest concept you know. So, for now, nobody will blame you if you imagine the containers as such.

Where does this container come from? How is it built?

This is a theoretical tutorial. Learn how to put Docker in practice in the tutorial: Docker, my first container.

Docker images and Dockerfile

A docker image is a file, a kind of archive, containing the software to run and all the files needed for running containers based on that image. If you know the *.iso files that we can mount on USB keys, it's a similar concept.

A container is an instance of an image. In other words, the image describes a system and a container is a concrete implementation of this description. Some analogies:

The image is the architectural plan and the container is the house.
The image is the PHP class and the container is the PHP object.

OK, so if I want to run a container, I first need an image: where does this image come from? How is it built?

You can create your own image from scratch but almost nobody does that. You will most likely base your image on an already existing image built by the community.

A docker registry is a place where you can find images. You can create your own private registry for storing your images or you can choose to rely on the public and official ones like: hub.docker.com. On these public hubs, you can find customized and official images for most of the well known technologies. These official images are provided by Docker communities or directly by the company maintaining the software, they are fully tested, secured and reliable. For instance, if you want an Nginx HTTP server, don't try to reinvent the wheel, use the Nginx image from the Docker hub.

For your use cases, you will surely need to customize the basic images: change the entry port, add a mounting point to make files available inside the container and so on... There are several ways to do so:

First option, you can run a container from the basic image, do the modifications then save the result in a new image. Pros: the modifications are once and for all hardcoded in your image. It can itself serve as a basis for new images. Cons: you will need to store the image somewhere and, I forgot to tell you: images are really heavy and can take several gigabytes.
Second option, the one you will choose in 99% of cases: you will write a Dockerfile. A Dockerfile is an easy to understand text file describing how to build your own image from a basic image. The dockerfile will describe many things like: the commands to execute while building the customized image, the commands to execute when creating or launching a container, the ports to expose to other containers, the volume to mount allowing to share files between the host and a container and so on... This dockerfile, by being just a text file, is easily shareable.

Microservices and responsibilities

Let's take an example: you want to create a website with the language PHP, the server Apache, the database management system MySQL and the cache management system Memcached. You want all this technical stack to run into docker containers.

The first choice you have is to directly take the image of a raw OS, like Ubuntu, and to write a Dockerfile describing all the elements of the technical stack: "install PHP then install MySQL then...". It's easy to do but, as always, the first choice is not the best. By creating a monolithic installation, you would make it difficult to maintain on the long term:

You cannot easily upgrade one of the components without affecting the whole stack.
You cannot reuse one of the components in another context. For instance you might want to use the same database system for another project.
When your project will become more and more successful, you will need to replicate parts of your installation on several machines in order to absorb the load of your users. You will not want to deploy the same number of frontal, cache and database servers.

All of this leads to the second choice: analyze the responsibilities of your stack and split them into several services. Each service can itself be split into smaller services. From this analysis, you will deduct the different pieces of your architecture and the docker images to build.

In our example, we can identify: a frontal service receiving the requests from the users, a database service retrieving and receiving data from the frontal service and a cache service relieving the load for the database. The result is one dockerfile for mysql, one for apache and php and one for memcached.

Couldn't we split apache and php in two distinct services?

We could, indeed, have dug deeper and split some services into sub-services: it's up to you to know which granularity you want for your stack. Always try to estimate where you are on the balance between complexity and maintainability.

Again, basic images for the main technologies are already available on the public registries. Always check these registries before reinventing the wheel. For our example, there is already existing images for: PHP, Apache, MySQL and Memcached.

Docker container vs virtual machine

You told us that a docker container is not a virtual machine. So, what is it?

Virtual machines are isolated environments containing a complete stack of execution: the operating system, the libraries and all the ecosystem. The instructions are managed by the O.S of the virtual machine. This is virtualization.

Docker is based on a native linux technology called "linux containers" or LXC. The instructions are directly managed by the host. Docker serves as global orchestrator: share resources, create virtual networks between the containers and so on... This specific kind of virtualization is called containerization.

Containers are faster and lighter than virtual machines both on the disk and on the memory. Note that I didn't say "better": both technologies are useful depending on the needs.

Why is docker so popular these days?

Here is a summary of what you learned, explaining by itself why docker is a widely used technology.

Speed efficient and light-weight due to its direct link with the linux kernel.
Easy to deploy on new machines: newcomers in a team, need for replication of your production.
A strong community supports the product and freely shares resources for all your needs.
Natively supported by many container managers and orchestrators, like Kubernetes, allowing the deployment of strong architectures and the creation of clouds.

Learn how to put Docker in practice in the tutorial: Docker, my first container.

Have a nice day :)