How it works: Docker
This is the first in the series ‘How it works’ where I explain how cool software works under the hood. Here we take a look at Docker, written in Go.
What is Docker?
Docker is basically a convenience tool around LXC & AUFS and does things like mounting and
starting lxc-start
. It allows you to virtualize apps and run them in isolation in containers. We are looking at Docker version 0.6.2, build 081543c
It’s not so hard to understand LXC it’s just a bit intimidating because it barfs all semi-complicated things of linux together in one crazy piece of functionality.
Why am I reviewing Docker?
Containers are awesome and the future. Containers allow applications to be virtualized without the overhead of a virtual machine. It’s native, medium-portable (i.e. depending on the harware) and medium-isolated (docker breakouts are a thing) but can be hardend. It will be the defacto deployment mechanism for the future.
Docker is the first that tries to popularize this concept. I think however, that the current implementation of Docker isn’t explicit enough. Nevertheless, let’s get started to check how it works under the hood so you can write your own Docker.
Overview
There are a couple of things you need to understand when we go for a deep dive:
- First of all, Docker uses LXC containers, you can find info on it by typing
man lxc
or checkout the resources below. - Docker creates LXC config files named
config.lxc
it configures networking, and mounting of apps. - Docker uses AUFS (Another Union File System) to achieve layering of filesystems and provides you with a unified combined view across multiple mount points.
- Docker
images
are basically root filesystems where a lxc-container runs in. - Docker mounts a image root-fs as read only using a union filesystem, and creates another
container/rw
directory where the changes which your container does on this filesystem are written to. - Docker adds dropping of OS capabilities in the
config.lxc
for the processes running in the container so that you can’t executesys_nice
for instance.
Let’s get started
If you just install docker like given on the website for Ubuntu Raring. You can check out the file structure and contents of how docker works like this:
First sudo
your way into docker:
sudo -i
Then go to:
cd /var/lib/docker/
The directory structure looks like this. I used the tree -L 3
command. If you don’t have tree use apt-get install tree
to do this.
.
|-- containers
| `-- 75..19
| |-- 75..19-json.log
| |-- config.json
| |-- config.lxc
| |-- hostconfig.json
| |-- hostname
| |-- hosts
| |-- rootfs.hold
| `-- rw
|-- graph
| |-- 27cf784147099545
| | |-- json
| | |-- layer
| | `-- layersize
| |-- 610f32ad818cb0a1fa6179cde0d05f52476cc9a970d461a258d176883ec78972
| | |-- json
| | |-- layer
| | `-- layersize
| |-- 6ced267c00ab923ccc4293bea283322c3c301416dabaf6c2bf57aff2140c04a9
| | |-- json
| | |-- layer
| | `-- layersize
| |-- 8dbd9e392a964056420e5d58ca5cc376ef18e2de93b5cc90e868a1bbc8318c1c
| | |-- json
| | |-- layer
| | `-- layersize
| |-- b750fe79269d2ec9a3c593ef05b4332b1d1a02a62b4accb2c21d589ff2f5f2dc
| | |-- json
| | |-- layer
| | `-- layersize
| `-- _tmp
| `-- _dockerinit
|-- repositories
`-- volumes
17 directories, 18 files
Let’s go over this, directory by directory.
|-- containers
| `-- 75..19
| |-- 75..19-json.log
| |-- config.json
| |-- config.lxc
| |-- hostconfig.json
| |-- hostname
| |-- hosts
| |-- rootfs.hold
| `-- rw
When you create a container, it randomly generates a name UUID for it. And uses that UUID as a directory. It’s a long name so i’ve shortened it here.
7554ecfa9e7cd89...-json.log
is the container log file. It’s log but in JSON format.
config.json
holds the config. This is what it contains. Basically standard docker configuration. It’s docker specific, and not used by LXC.
{ "Args" : [ ],
"Config" : { "AttachStderr" : true,
"AttachStdin" : true,
"AttachStdout" : true,
"Cmd" : [ "/bin/bash" ],
"CpuShares" : 0,
"Dns" : null,
"Domainname" : "",
"Entrypoint" : null,
"Env" : null,
"Hostname" : "7554ecfa9e7c",
"Image" : "rails",
"Memory" : 0,
"MemorySwap" : 0,
"NetworkDisabled" : false,
"OpenStdin" : true,
"PortSpecs" : [ "3000:3000" ],
"Privileged" : false,
"StdinOnce" : true,
"Tty" : true,
"User" : "",
"Volumes" : null,
"VolumesFrom" : "",
"WorkingDir" : ""
},
"Created" : "2013-09-24T07:43:01.37032591-04:00",
"HostnamePath" : "/var/lib/docker/containers/75..19/hostname",
"HostsPath" : "/var/lib/docker/containers/75..19/hosts",
"ID" : "75..19",
"Image" : "610f32ad818cb0a1fa6179cde0d05f52476cc9a970d461a258d176883ec78972",
"NetworkSettings" : { "Bridge" : "",
"Gateway" : "",
"IPAddress" : "",
"IPPrefixLen" : 0,
"PortMapping" : null
},
"Path" : "/bin/bash",
"ResolvConfPath" : "/etc/resolv.conf",
"State" : { "ExitCode" : 0,
"Ghost" : false,
"Pid" : 0,
"Running" : false,
"StartedAt" : "2013-09-24T08:13:33.34387465-04:00"
},
"SysInitPath" : "/usr/bin/docker",
"Volumes" : { },
"VolumesRW" : { }
}
The config.lxc
contains LXC configuration. Its the container configuration that LXC uses when booting the container. Its LXC specific. You
can check man lxc.config
to see the syntax and the available configuration options.
# hostname
lxc.utsname = 7554ecfa9e7c
#lxc.aa_profile = unconfined
# network configuration
lxc.network.type = veth
lxc.network.flags = up
lxc.network.link = docker0
lxc.network.name = eth0
lxc.network.mtu = 1500
lxc.network.ipv4 = 172.17.0.15/16
# root filesystem
lxc.rootfs = /var/lib/docker/containers/75..19/rootfs
# enable domain name support
lxc.mount.entry = /var/lib/docker/containers/75..19/hostname
/var/lib/docker/containers/75..19/rootfs/etc/hostname none bind,ro 0 0
lxc.mount.entry = /var/lib/docker/containers/75..19/hosts
/var/lib/docker/containers/75..19/rootfs/etc/hosts none bind,ro 0 0
# use a dedicated pts for the container (and limit the number of pseudo terminal
# available)
lxc.pts = 1024
# disable the main console
lxc.console = none
# no controlling tty at all
lxc.tty = 1
# no implicit access to devices
lxc.cgroup.devices.deny = a
# /dev/null and zero
lxc.cgroup.devices.allow = c 1:3 rwm
lxc.cgroup.devices.allow = c 1:5 rwm
# consoles
lxc.cgroup.devices.allow = c 5:1 rwm
lxc.cgroup.devices.allow = c 5:0 rwm
lxc.cgroup.devices.allow = c 4:0 rwm
lxc.cgroup.devices.allow = c 4:1 rwm
# /dev/urandom,/dev/random
lxc.cgroup.devices.allow = c 1:9 rwm
lxc.cgroup.devices.allow = c 1:8 rwm
# /dev/pts/* - pts namespaces are "coming soon"
lxc.cgroup.devices.allow = c 136:* rwm
lxc.cgroup.devices.allow = c 5:2 rwm
# tuntap
lxc.cgroup.devices.allow = c 10:200 rwm
# fuse
#lxc.cgroup.devices.allow = c 10:229 rwm
# rtc
#lxc.cgroup.devices.allow = c 254:0 rwm
# standard mount point
# WARNING: procfs is a known attack vector and should probably be disabled
# if your userspace allows it. eg. see http://blog.zx2c4.com/749
lxc.mount.entry = proc /var/lib/docker/containers/75..19/rootfs/proc proc nosuid,nodev,noexec 0 0
# WARNING: sysfs is a known attack vector and should probably be disabled
# if your userspace allows it. eg. see http://bit.ly/T9CkqJ
lxc.mount.entry = sysfs /var/lib/docker/containers/75..19/rootfs/sys sysfs nosuid,nodev,noexec 0 0
lxc.mount.entry = devpts /var/lib/docker/containers/75..19/rootfs/dev/pts
devpts newinstance,ptmxmode=0666,nosuid,noexec 0 0
#lxc.mount.entry = varrun /var/lib/docker/containers/75..19/rootfs/var/run
tmpfs mode=755,size=4096k,nosuid,nodev,noexec 0 0
#lxc.mount.entry = varlock /var/lib/docker/containers/75..19/rootfs/var/lock
tmpfs size=1024k,nosuid,nodev,noexec 0 0
lxc.mount.entry = shm /var/lib/docker/containers/75..19/rootfs/dev/shm
tmpfs size=65536k,nosuid,nodev,noexec 0 0
# Inject docker-init
lxc.mount.entry = /usr/bin/docker /var/lib/docker/containers/75..19/rootfs/.dockerinit none bind,ro 0 0
# In order to get a working DNS environment, mount bind (ro) the host's /etc/resolv.conf into the container
lxc.mount.entry = /etc/resolv.conf /var/lib/docker/containers/75..19/rootfs/etc/resolv.conf none bind,ro 0 0
# drop linux capabilities (apply mainly to the user root in the container)
# (Note: 'lxc.cap.keep' is coming soon and should replace this under the
# security principle 'deny all unless explicitly permitted', see
# http://sourceforge.net/mailarchive/message.php?msg_id=31054627 )
lxc.cap.drop = audit_control audit_write mac_admin mac_override mknod setfcap setpcap sys_admin \
sys_boot sys_module sys_nice sys_pacct sys_rawio sys_resource sys_time sys_tty_config
# limits
I’ll highlite a couple of interesting settings:
- Under the section
# enable domain name support
it mounts thehostname
andhosts
file read-only. lxc.cgroup.devices.deny = a
denies all the devices, under it, it lists all allowed devices- It mounts
procfs
,sysfs
,devpts
,shm
- It mounts
/etc/resolv.conf
to share dns with the host - It drops linux capabilities in the
lxc.cap.drop
section, in order to harden the container.
Looking further in the directory we have a hostname
which nothing more than the name of the host. And we have a hosts
file
which contains:
127.0.0.1 7554ecfa9e7c
::1 7554ecfa9e7c
127.0.0.1 localhost
::1 localhost ip6-localhost ip6-loopback
fe00::0 ip6-localnet
ff00::0 ip6-mcastprefix
ff02::1 ip6-allnodes
ff02::2 ip6-allrouters
It basically ensures that we can modify the hosts file outside the container.
The RootFS
Still a bit of a mystery on how the rootfs gets created. It’s not visible in the conf file so maybe docker init
does something?
In theory i think it does this via AUFS
container/rootfs = /images/my-root-fs/(read-only) + /containers/my-container/rw(read-write)