Cloud Native and Containerisation (Joint Meetup with Docker Bangalore) – Docker Internals – Sangam Biradar

Container Overview – in short

● A definition says “Containers are an abstraction at the app layer that packages code and dependencies together”. It practically means just the Application and its dependent binaries and libraries are packaged into a container with no extra baggage of an operating system.

How its work then?

How does it work then? 

 Containers utilize the host operating system kernel and runs in an isolated user space. Multiple containers can be run on a host as it shares the Host OS kernel but runs in isolated user spaces with no visibility to each other. If you look at it from the container side, it will have a filesystem of its own and cannot see the host file system. Containers will also have its own process table different from the Host OS process table. Remember hardware virtualization is what enables Virtual Machines , on the contrary containerization is all about Operating System Virtualization. They are lightweight as they do not depend on an additional layer like hypervisor. Containers use a layer of software called container engine on top of the OS. An example of container engine is 

Docker. They have a significant lesser overhead than VM. This is because of the sharing of the kernel with the host OS which means containers can start and stop extremely fast. Usually the startup time is the time that container process takes to start. A typical path to production deployment involves the software to go through development environment, Test environment and finally into the Live or Production state. Each of these stages will involve installation and configuration of staging environment including all the complex dependencies which is like thrice the effort. In the world of containers, if you intend to move your application say from your test environment to production, just build the image and use the same image in the production environment. Upgrading the application software is not same anymore. The traditional methods includes upgrading your virtual machines right from the dependencies for the application and the application itself. With containers, we come across “Immutable Infrastructures” meaning, there is no upgrade procedures any more, just delete your present containers and create new ones. The new containers can be spun up in a matter of seconds. There are other CI/CD tools which can aid this in production environment from a different angle. We plan to cover this topic in detail in the upcoming posts. All the discussions above can be summed up into the below diagrams. Figure-1 shows how multiple layers are involved between the application and the host operating system. Figure-2 shows how application and dependencies are packaged and the containers running directly on the host OS.

Containers operate in isolation. This isolation is aided by few kernel features one of them being Linux Cgroups. This feature allows you to group a set of process which can run as a related unit. This group of processes can be controlled in terms of how much  memory, CPU utilization, I/O both disk and network it can use. Cgroups give fine grained control over allocation, monitoring and managing of system resources. Hardware resources can be divided among tasks and users increasing efficiency.

Figure-1
Figure-2
Figure-3

However Figure-2 gives us an impression that the container engine is in the execution path between the application and the host OS which is not the case. Figure-3 removes this ambiguity with the container engine shown as daemon running on the host OS. The daemon generally interacts with the containers and the container images.

Some other examples of container technologies include LXC, OpenVZ, Linux VServer, BSD Jails, and Solaris zones.

Linux Namespaces Another kernel feature which allows a restricted view of the system. When you login into a Linux machine, you will see it’s file systems, processes, network interfaces. Once you create a container, you will enter into a namespace which will render a restricted view and the container will have its own file systems, processes and network interfaces different from the host machine its running on. There are multiple namespaces that are used by containers and primarily Docker and each process is in one namespace of each type.

Pid – (Process isolation) Processes within a PID namespace only see processes in the same PID namespace. Each PID namespace has its own numbering and starts with 1. When the PID 1 goes away, the whole namespace is killed. In the below example, an ubuntu container is spun up with an interactive bash terminal. The process table of this container has PID 1 which is the command /bin/bash. To summarise it, the container lifecycle is tied to the PID 1 in the container process table and when the PID 1 is killed, the container stops. 

Net – (Network isolation) Processes within the network namespace get their own network stack. This includes network interfaces, routing tables, iptables and sockets.

Mnt – ( Filesystem mount points isolation)  Processes can have their own root filesystem and this will be different from the host filesystem. 

UTS – (Nodename and Domainname isolation) This is to set the different hostname for each container. When you login to one of the containers and type uname -a, it will be different form the other

IPC – (Inter Process Communication resource isolation) – To put it in simple terms, if two containers are trying to access something common, it could be a shared memory, message queues and there occurs a conflict as they are trying to access the same name or construct. This namespace will allow the containers to use the same constructs. It could have a private, shareable or even utilise the host system’s namespace. Below is an example of shared IPC namespace.

User – User name space will allow you to be a privileged user with in the container when you are a non privileged user outside of it. You might be UID 1000 outside but UID 0 (root) inside. You will see a different user table on host and containers

Docker Images Linux Containers (LXC) came in around 10 years back, but then how did Docker become the talking point in the world of containers ?  The key differentiator was the Docker image. Never before could you actually encapsulate an application, it’s dependencies and configuration files into a lightweight portable bundle. This along with the Docker Engine API triggered the adoption of containers at a large scale.  A Docker image will comprise of multiple layers. You will see this when you do a “docker pull <image>” where it pulls different layers of the image from the registry. Each layer of the image is a image on its own and any changes made to the image will be saved as layers on top of the base image layer creating a nesting relationship as shown below. The base image is read-only and the top layers are read-write. 

Docker Networking Each Docker container has its own network stack and the NET namespace that we discussed above helps in achieving this. By default docker creates three networks as shown below out of which docker bridge (appears as docker0 on host) is the default networking type unless explicitly specified. 

Cgroups

Control Groups (cgroups) are a mechanism for applying hardware resource limits and access controls to a process or collection of processes. The cgroup mechanism and the related subsystems provide a tree-based hierarchical, inheritable and optionally nested mechanism of resource control. To put it simply, cgroups isolate and limit a given resource over a collection of processes to control performance or security. Cgroups can generally be thought of as implementing traditional ulimits/rlimits, but now operating across groups of tasks or users. A new, more powerful and more easily-configured alternative to ulimits/rlimits. To silence the naysayers and doubt over code bloat or added complexity. The Cgroups (control groups) subsystem is a Resource Management and Resource Accounting / Tracking solution, providing a generic process grouping framework.

Its handles resources such as memory, CPU, networking and more. Memory Cgroup: accounting We will count how much memory used by each process we will track every single memory page.

Memory Cgroups: limits
Each group can have its own limits there are two type of limits soft limits and hard limit. Limits can set different

Layered Filesystems

Namespaces and CGroups are the isolation and resource sharing sides of containerisation. They’re the big metalsides and the security guard at the dock. Layered Filesystems are how we can efficiently move whole machine images around: they’re why the ship floats instead of sinks. At a basic level, layered filesystems amount to optimising the call to create a copy of the root filesystem for each container.

Security responsibility

Developers appreciate containers because they can package their application, test it alongside its libraries, and verify that it will work in production. Operations teams appreciate containers because they get the applications in a cohesive package along with their dependencies and configurations

Container Runtime

Namespace sharing

Demo :

http://dockerlabs.collabnix.com/presentation/docker-internals.html#/

Thanks Cloud Native bangalore !

thanks hackr.io for sending swags

Did you find Dockerlabs useful? Vote for us.

https://hackr.io/tutorial/dockerlabs-docker-and-kubernetes

1 thought on “Cloud Native and Containerisation (Joint Meetup with Docker Bangalore) – Docker Internals – Sangam Biradar

Leave a Reply

Your email address will not be published. Required fields are marked *