I want to start to tackle two very important questions that we are going to be answering throughout this blog post.
The two important questions are:
- What is Docker?
- Why do we use Docker?
Let’s answer first Why we do use Docker by going through a quick little demo right now.
Let’s have a look at this flow diagram.
This is a flow of probably a process you’ve gone through at least once in your life before.
It’s a flow of installing software on your personal computer, and I bet that at least once for you, as for me, this is what has happened.
You have downloaded an installer, you run that installer, and then inevitably at some point in time you end up getting an error message during the installation.
Then what do you do?
Well you probably troubleshoot the issue by googling it. You try to find the solution that eventually solve that issue. You then rerun the installer only to find out that hey, you have some other error appearing!
And then you try to go through this entire troubleshooting process again.
So, this is at its core what Docker is trying to fix.
Docker wants to make it really easy and straightforward for you to install and run software on any given computer.
Not just your computer or your personal laptop, personal desktop, but on web servers as well like Google Cloud Platform or AWS.
Or any cloud-based computing platform.
The point is that with the common way of installing the software you can get into this endless cycle of trying to do all this troubleshooting as you are installing and running software. And then when you release this software or move it in a new environment you kinda have to redo it again and again.
So, let me now show you how easy it is to run, for example, a Docker container with Redis.
If you’re making use of Docker you can just go on your command line run one single command.
$ docker run -it redis
And then after a very brief pause almost instantaneously you will have an instance of Redis up and running on your computer. No issues, no trouble shooting. Ready to go!
And that’s pretty much it.
That is Docker in a nutshell, that is how easy it is to run software when you’re making use of docker.
In the end, to answer the question very directly of “Why do we use Docker?”; Well we make use of it because it makes life really easy for installing and running software without having to go through a whole bunch of setup or installation of dependencies.
Now we’re going to learn a bit more throughout this post of why we use Docker.
But I just wanted to give you a very quick example and show you how easy it can be to get up and running some new piece of software when you are using Docker.
Now, let’s answer the question “What is Docker?”.
Well this question is a lot more challenging to answer, any time you see someone refer to Docker in a blog post or an article or a forum or wherever it might be, they’re kind of making reference to an entire ecosystem of different projects, tools and pieces of software.
So, if someone says that they used Docker on their project they might be referring to Docker Client or Docker Server, they might be referring to Docker Hub or Docker Compose.
Again, these are all projects, tools, pieces of software that come together to form a platform or ecosystem around creating and running something called containers.
Now, what is a container?
When just a moment ago, we ran that command at the terminal of
$ docker run redis and went through a little series of actions behind the scenes there were two important parts of that process.
When I ran that command, something called the Docker CLI reached out to something called the Docker Hub and it downloaded a single file called an Image.
An image is a single file containing all the dependencies
and all the configuration required to run a very specific program.
For example, Redis, which is what the image we just downloaded was suppose to run.
This is a single file that gets stored on your hard drive and at some point in time you can use this image to create something called a container.
A container is an instance of an image.
And you can kind of think of it as being like a running program.
So, at a high level, a container is a program with its own isolated set of hardware resources.
So, it kind of has its own little set or its own little space of memory, has its own little space of networking technology and its own little space of hard drive space as well.
So, we didn’t really answer the question here of what Docker is, but we did learn at least that a reference to Docker is really talking about a whole collection of different projects and tools.
And we also learnt two important pieces of terminology, a Docker Image and a Container.
Now these images and containers are the absolute backbone of what you are going to be working with throughout the use of Docker.
At this point you need to install some software onto your machine so that you can work directly with images and containers and we can get a better idea of how they work.
So, here’s what you have to do. (Hopefully you will have a Mac as me 🙂 the installation on Windows is not that easy, because on Windows you will need to install a VM running a Linux OS).
Now we’re going to be installing a software called Docker CE. Inside of this program are two very important tools that we’re going to be making use of throughout this post.
The first tool that’s inside this package is something called the Docker Client, the Docker Client (also known as the Docker CLI is a program that you are going to interact with quite a bit from your terminal.
You are going to enter in commands to your terminal, issue them to Docker Client, then Docker Client it’s going to take your commands and figure out what to do with them.
Now the Docker Client itself doesn’t actually do anything with containers or images. Instead the Docker Client is really just a tool or a portal of sorts to help us interact with another piece of software that is included in this Docker CE package called the Dockers Server. This is also frequently called the Docker Daemon.
This program right here is the actual tool or the actual piece of software that is responsible for creating containers, images, maintaining containers, uploading images and doing just about everything you can possibly imagine around the world of Docker.
So, it’s the Docker Client that you issue commands to.It’s the thing that we interact with and behind the scenes this client is interacting with the Docker Server.
You are never going to really reach directly out to the Docker Server.
It’s something that’s just kind of running behind the scenes.
Let’s see how the set up process works on a Mac OS machine.
Here’s what you need to do.
We’re going to first go to the Docker CE page at https://store.docker.com/editions/community/docker-ce-desktop-macwhere you are going to sign up for a Docker Hub account.
We need a Docker Hub account not only to download the Docker CE for Mac, but we also need it for some stuff that we’re going to do later on.
It is a really straight forward installation, I am not gonna explain how to install a software on Mac, I assume you know that.
Now once you get this thing installed we’re going to launch Docker by double clicking it, the first time you do this it’s going to appear that nothing at all is happening.
But if you look up at the top right-hand side of your screen you’ll see this little kind of whale icon with a set of boxes on it.
And if you click on that you’ll see something that says Docker is starting.
After a minute or two it’s going to eventually resolve, and the boxes are going to go completely steady.
That means that everything is done and set up on your local machine.
When the boxes have steadied out which means that Docker has successfully booted up.
Now the last thing that you have to do is make sure that you log into Docker.
So, you’re going to click on the little button and then you’re going to see a button that says “log in with Docker id”, click on that and log in with the credentials you just created on the Docker web page.
Now once you do that the very last thing that we need to do is we need to make sure that everything was set up correctly.
So for this you’re going to open up your terminal, you can open up your terminal in any folder doesn’t matter, and you’re going to run a single command,
$ docker version.
You should see something like this appear on the screen.
That means that you are all set and you’re good to go.
If you get an error message it means that something went wrong with the installation process and you’ll want to jump over and google your issue. You are welcome 🙂
I now want you to write out our very first kind of meaningful command with the Docker Client or the Docker CLI.
You’re going to run a very quick command here and then we’re going to go through a very specific flow of actions that occurred when that command got executed.
You’re going to write
$ docker run hello-world.
Yes, it is the kind of Hello World thing, the same thing you do with every new things you are learning in computer science :).
But I promise it’s going to be rather interesting.
So, write the command and then press enter and you’re going to very quickly see a lot of text start to scroll along the screen.
Now if you scroll up a little bit you’ll see a little “Hello from Docker!” message right there and then you’ll notice underneath that it lists out these series of steps that just occurred when you ran that command.
Now if you look well, right after the command, you can see that it says “unable to find image hello-world locally”.
So, whit that in mind let’s go take a look at a couple of diagrams that are going to help explain what just occurred when we ran that command.
So, at the terminal you executed the command
$ docker run hello-world.
That starts up that Docker Client or the Dockers CLI.
Again, the Docker CLI is in charge of taking commands from you, kind of doing a little bit of processing on them and then communicating the commands over to the Docker Server and it’s that Docker Server that is really in charge of the heavy lifting.
When we ran the command
$ docker run hello-world, that meant that we wanted to start up a new container using the image with the name of hello-world, the hello-world image has a tiny little program inside of it whose sole purpose, sole job, is to print out the message that you see right in your console.
That’s the only purpose of that image.
Now when we ran that command and it was issued over to the Docker Server, a series of actions very quickly occurred in the background.
The Docker Server saw that we were trying to start up a new container using an image called hello-world.
The first thing that the Dockers Server did was check to see if it already had a local copy, like a copy on your personal machine of the hello-world image or hello-world file.
So, the Docker Server looked into something called the Image Cache.
Now because you had just installed Docker on your personal computer that Image Cache is currently empty.
You have no images that have already been downloaded before. (As long as you haven’t used Docker before).
So because the image cache was empty the Docker Server decided to reach out to a free service called Docker Hub, the Docker hub is a repository of free public images that you can freely download and run on your personal computer, so Docker Server reached out to Docker Hub and said hey, I’m looking for an image called hello-world. Do you have one?
Of course, the Docker Hub does have one and the Docker Server downloaded this hello-world file and stored it on your personal computer in this Image Cache where it can now be rerun at some point in the future very quickly without having to re-download it from the Docker Hub.
After that, the Docker Server then said OK great, I’ve got this image and now it’s time to use it to create an instance of a Container.
And remember what we just said about a container a moment ago, a Container is an instance of an Image.
Its sole purpose is to run one very specific program, so Docker server essentially took that single file, loaded it up into memory, created a container out of it and then ran a single program inside of it and that single program’s purpose was to print out the message that you see right here.
That’s what happens when you run this docker run command.
It reaches out to Docker Hub.
It grabs the image and then it creates a container out of that image.
Now you’ll notice a kind of interesting thing, if you run the
$ docker run hello-world a second time, you’ll notice that you are not going to see the message of downloading or unable to find image locally, though we saw it the first time.
And that is because we already downloaded it to our Image Cache on our personal computer.
So, the big lesson here is that the first time that you try to make use of any of these public images you’re going to have to do a little bit of a download.
But then in the future after that, you can start up a container using that image much more quickly because the image has already been downloaded to your computer.
But we still haven’t answered to what a container is?
Now to understand the container you first need to have a little bit of background on exactly how your operating system runs on your computer.
So, let’s start first giving you a quick overview of your operating system.
This is a quick overview of the operating system on your computer. Most operating systems have something called a kernel.
The kernel is a running software process that governs access between all the programs that are running on your computer and all the physical hardware that is connected to your computer as well.
So, if you’re at the top of this diagram.
we have different programs that your computer’s running such as Chrome or Terminal, Spotify, NodeJS etc.
If you’ve ever made use of NodeJS before and you’ve written a file to the hard drive, it’s technically not NodeJS that is speaking directly to the physical device instead NodeJS says to your Kernel hey, I want to write a file to the hard drive.
The Kernel then takes that information and eventually persists it to the hard disk.
So the kernel is always kind of this intermediate layer that governs access between these programs in your actual hard drive.
The other important thing to understand here is that these running programs interact with the kernel through things called system calls.
These are essentially like function invocations.
The kernel exposes different endpoints to say hey, if you want to write a file to the hard drive call this endpoint or this function right here, it takes some amount of information and then that information will be eventually written to the hard disk or memory or whatever else is required.
Now thinking about this entire system right here I want to pose a kind of hypothetical situation to you.
I want you to imagine for just a second that you have two programs running on your computer.
Now imagine that we’re in a crazy world where Chrome, in order to work properly, has to have
Python version 2 installed and no NodeJS has to have version 3 installed (It is not a so crazy world after all).
However, on our hard disk we only have access to Python version 2 and for whatever crazy reason we are not allowed to have two identical installations of Python at the same time.
So, as it stands right now Chrome would work properly because it has access to version 2 but NodeJS would not because we do not have a version or a copy of Python version 3.
Again, this is a completely make-believe situation.
I just want you to kind of consider this for a second because this is kind of leading into what a container is.
So how can we solve this issue?
Well one way to do it would be us to make use of a operating system feature known as namespacing, with namespacing we can look at all of the different hardware resources connected to our computer and we can essentially segment out portions of those resources so we could create a segment of our hard disk specifically dedicated to housing Python version 2. And we could make a second segment specifically dedicated to housing Python version 3.
Then to make sure that Chrome has access to the right segment and NodeJS has access to the other segment, any time that either them issues a system call to read information off the hard drive the kernel will look at that incoming system call and try to figure out which process it is coming from.
So the kernel could say: “okay, if Chrome is trying to read some information off the hard drive I’m going to direct that call over to this little segment of the hard disk that has Python version 2 and NodeJS each time that makes the system call to read the hard drive the kernel can redirect that over to the other segment that has Python version 3”.
And so, by making use of this kind of namespacing we’re segmenting features.
We can have the ability to make sure that Chrome and NodeJS are able to work on the same machine.
Now again in reality neither of these actually needed installation of Python.
This is just a quick example.
So, this entire process of kind of segmenting a hardware resource based on the process that is asking for it is known as namespacing, with namespacing we are allowed to isolate resources per a process or a group of processes and we are essentially saying that any time a particular process asks for a resource we’re going to direct it to his little specific area of the given piece of hardware.
Now namespacing is not only used for hardware it can be also used for software elements as well.
So, for example we can namespace a process to restrict the area of a hard drive that is available or the network devices that are available or the ability to talk to other processes or the ability to see other processes.
These are all things that we can use namespace for, to essentially limit the resources, we are kind of redirect requests for resource from a particular process.
Very closely related to this idea of some namespace is another feature called Control Groups, a control group can be used to limit the amount of resources that a particular process can use.
So, namespace is for saying hey, this area of the hard drive is for this process, a control group can be used to limit the amount of memory that a process can use, the amount of CPU, the amount of hard drive input output and the amount of network bandwidth as well.
So, these two features put together can be used to really kind of isolate a single process and limit the amount of resources it can talk to and the amount of bandwidth essentially that it can make use of.
Now as you might imagine this entire kind of little section that you can see in the images above, this entire vertical of a running process plus that little segment of resource that it can talk to, is what we refer to as a container.
And so, when people say oh yeah I have a container. You really should not think of these as being like a physical construct that exists inside of your computer.
Instead, a container is really a process or a set of processes that have a grouping of resources specifically assigned to it.
This is a diagram that can be quite a bit handy any time that you think about a container.
We’ve got some running process that sends a system call to a kernel. The kernel is going to look at that incoming system call and direct it to a very specific portion of the hard drive, the RAM, CPU or whatever else it might need.
And a portion of each of these resources is made available to that singular process.
Now the last question you might have here is: what is the real relation between one of those containers? Are that kind of singular process and grouping of resources an image?
How is that single file eventually creates this container?
Well, let’s have a look at another diagram, any time that we talk about an image we’re really talking about a file system snapshot.
So, this is essentially kind of like a copy paste of a very specific set of directories or files.
And so we might have an image that contains just Chrome and Python, an image will also contain a specific startup command.
So, here’s what happens behind the scenes when we take an image and turn it into a container.
First off, the kernel is going to isolate a little section of the hard drive and make it available to just this container.
And so, we can kind of imagine that after that little subset is created the file snapshot inside the image is taken and placed into that little segment of the hard drive.
And so, now inside of this very specific grouping of resources we’ve got a little section of the hard drive that has just Chrome and Python installed and essentially nothing else.
The startup command is then executed, which we can kind of imagine in this case, is like startup Chrome, just run Chrome for me.
And so, Chrome is invoked.
We create a new instance of that process and that created process is then isolated to this set of resources inside the container.
That is the relationship between a container and an image and it’s how an image is eventually taken and turn into a running container.
Just a quick recap and then the last thing I want to add.
So, we had said that a container is a running process along with a subset of physical resources on your computer that is allocated to that process specifically.
We also spoke a little bit about the relationship between an image and a running container member.
An image is really kind of a snapshot of the file system along with a very specific startup command as well.
Now one thing, in the earlier we spoke a bit about the separation or the kind of isolation of these resources through a technique called namespacing.
And we also said that we could limit the number of resources used by these Control Groups things as well.
Now, this feature of the Namespacing and Control Groups is not included by default with all operating systems.
Even though I had kind of specifically said your operating system has a kernel, these features of Namespacing and Control Groups are specific to the Linux operating system.
So namespacing, control groups, belong to Linux not to Windows, not to Mac OS.
Now that might make you kind of question or wonder how are you running Docker right now. What’s happening then?
You know we are running a Docker Client and we are running Docker containers on a Mac OS or Windows operating system.
How is that happening if these are Linux specific features?
Well here is what is happening behind the scenes:
When you installed Docker for Mac or Docker for Windows you installed a Linux virtual machine.
So as long as Docker is running, you technically have a Linux virtual machine running on your computer, inside of this virtual machine is where all these containers are going to be created.
So, inside the virtual machine, we have a Linux Kernel and that Linux kernel is going to be hosting running processes inside of containers and it’s that Linux kernel that is going to be in charge of limiting access or kind of constraining access or isolating access to different hardware resources on your computer.
You can actually kind of see this Linux virtual machine in practice by opening up your terminal and run that
$ docker version command again and looking at your server you’ll notice that there’s actually an OS entry and you’ll notice that it probably doesn’t have your operating system listed, it should say Linux as the operating system.
So that is kind of specifying that you are running a Linux Virtual machine and that’s what’s being used to host all these different containers that you are going to be working with.
There’s still a tremendous amount more to learn about containers and images.
But this doesn’t want to be a comprehensive Docker’s course, it just want to be a high-level overview and introduction to the Docker’s world, something that can help you to better understand how it is working even though you are not an expert developer/engineer, or maybe you just want to figure out what it is but you don’t really want or need to use it.
Many blockchains projects heavily rely on Docker for the creation and deployment of the network, because it let you define your network and then you can deploy it wherever you want, in minutes and without the need of any configuration.