How it works: Docker

This is the first in the series ‘How it works’ where I explain how cool software works under the hood. Here we take a look at Docker, written in Go.

What is Docker?

Docker is basically a convenience tool around LXC & AUFS and does things like mounting and starting lxc-start. It allows you to virtualize apps and run them in isolation in containers. We are looking at Docker version 0.6.2, build 081543c

It’s not so hard to understand LXC it’s just a bit intimidating because it barfs all semi-complicated things of linux together in one crazy piece of functionality.

Why am I reviewing Docker?

Containers are awesome and the future. Containers allow applications to be virtualized without the overhead of a virtual machine. It’s native, medium-portable (i.e. depending on the harware) and medium-isolated (docker breakouts are a thing) but can be hardend. It will be the defacto deployment mechanism for the future.

Docker is the first that tries to popularize this concept. I think however, that the current implementation of Docker isn’t explicit enough. Nevertheless, let’s get started to check how it works under the hood so you can write your own Docker.

Overview

There are a couple of things you need to understand when we go for a deep dive:

First of all, Docker uses LXC containers, you can find info on it by typing man lxc or checkout the resources below.
Docker creates LXC config files named config.lxc it configures networking, and mounting of apps.
Docker uses AUFS (Another Union File System) to achieve layering of filesystems and provides you with a unified combined view across multiple mount points.
Docker images are basically root filesystems where a lxc-container runs in.
Docker mounts a image root-fs as read only using a union filesystem, and creates another container/rw directory where the changes which your container does on this filesystem are written to.
Docker adds dropping of OS capabilities in the config.lxc for the processes running in the container so that you can’t execute sys_nice for instance.

Let’s get started

If you just install docker like given on the website for Ubuntu Raring. You can check out the file structure and contents of how docker works like this:

First sudo your way into docker:

sudo -i

Then go to:

cd /var/lib/docker/

The directory structure looks like this. I used the tree -L 3 command. If you don’t have tree use apt-get install tree to do this.

.
|-- containers
|   `-- 75..19
|       |-- 75..19-json.log
|       |-- config.json
|       |-- config.lxc
|       |-- hostconfig.json
|       |-- hostname
|       |-- hosts
|       |-- rootfs.hold
|       `-- rw
|-- graph
|   |-- 27cf784147099545
|   |   |-- json
|   |   |-- layer
|   |   `-- layersize
|   |-- 610f32ad818cb0a1fa6179cde0d05f52476cc9a970d461a258d176883ec78972
|   |   |-- json
|   |   |-- layer
|   |   `-- layersize
|   |-- 6ced267c00ab923ccc4293bea283322c3c301416dabaf6c2bf57aff2140c04a9
|   |   |-- json
|   |   |-- layer
|   |   `-- layersize
|   |-- 8dbd9e392a964056420e5d58ca5cc376ef18e2de93b5cc90e868a1bbc8318c1c
|   |   |-- json
|   |   |-- layer
|   |   `-- layersize
|   |-- b750fe79269d2ec9a3c593ef05b4332b1d1a02a62b4accb2c21d589ff2f5f2dc
|   |   |-- json
|   |   |-- layer
|   |   `-- layersize
|   `-- _tmp
|       `-- _dockerinit
|-- repositories
`-- volumes

17 directories, 18 files

Let’s go over this, directory by directory.

|-- containers
|   `-- 75..19
|       |-- 75..19-json.log
|       |-- config.json
|       |-- config.lxc
|       |-- hostconfig.json
|       |-- hostname
|       |-- hosts
|       |-- rootfs.hold
|       `-- rw

When you create a container, it randomly generates a name UUID for it. And uses that UUID as a directory. It’s a long name so i’ve shortened it here.

7554ecfa9e7cd89...-json.log is the container log file. It’s log but in JSON format.

config.json holds the config. This is what it contains. Basically standard docker configuration. It’s docker specific, and not used by LXC.

{ "Args" : [  ],
  "Config" : { "AttachStderr" : true,
      "AttachStdin" : true,
      "AttachStdout" : true,
      "Cmd" : [ "/bin/bash" ],
      "CpuShares" : 0,
      "Dns" : null,
      "Domainname" : "",
      "Entrypoint" : null,
      "Env" : null,
      "Hostname" : "7554ecfa9e7c",
      "Image" : "rails",
      "Memory" : 0,
      "MemorySwap" : 0,
      "NetworkDisabled" : false,
      "OpenStdin" : true,
      "PortSpecs" : [ "3000:3000" ],
      "Privileged" : false,
      "StdinOnce" : true,
      "Tty" : true,
      "User" : "",
      "Volumes" : null,
      "VolumesFrom" : "",
      "WorkingDir" : ""
    },
  "Created" : "2013-09-24T07:43:01.37032591-04:00",
  "HostnamePath" : "/var/lib/docker/containers/75..19/hostname",
  "HostsPath" : "/var/lib/docker/containers/75..19/hosts",
  "ID" : "75..19",
  "Image" : "610f32ad818cb0a1fa6179cde0d05f52476cc9a970d461a258d176883ec78972",
  "NetworkSettings" : { "Bridge" : "",
      "Gateway" : "",
      "IPAddress" : "",
      "IPPrefixLen" : 0,
      "PortMapping" : null
    },
  "Path" : "/bin/bash",
  "ResolvConfPath" : "/etc/resolv.conf",
  "State" : { "ExitCode" : 0,
      "Ghost" : false,
      "Pid" : 0,
      "Running" : false,
      "StartedAt" : "2013-09-24T08:13:33.34387465-04:00"
    },
  "SysInitPath" : "/usr/bin/docker",
  "Volumes" : {  },
  "VolumesRW" : {  }
}

The config.lxc contains LXC configuration. Its the container configuration that LXC uses when booting the container. Its LXC specific. You can check man lxc.config to see the syntax and the available configuration options.

# hostname

lxc.utsname = 7554ecfa9e7c

#lxc.aa_profile = unconfined

# network configuration
lxc.network.type = veth
lxc.network.flags = up
lxc.network.link = docker0
lxc.network.name = eth0
lxc.network.mtu = 1500
lxc.network.ipv4 = 172.17.0.15/16

# root filesystem

lxc.rootfs = /var/lib/docker/containers/75..19/rootfs

# enable domain name support
lxc.mount.entry = /var/lib/docker/containers/75..19/hostname
	/var/lib/docker/containers/75..19/rootfs/etc/hostname none bind,ro 0 0

lxc.mount.entry = /var/lib/docker/containers/75..19/hosts
	/var/lib/docker/containers/75..19/rootfs/etc/hosts none bind,ro 0 0

# use a dedicated pts for the container (and limit the number of pseudo terminal
# available)
lxc.pts = 1024

# disable the main console
lxc.console = none

# no controlling tty at all
lxc.tty = 1

# no implicit access to devices
lxc.cgroup.devices.deny = a

# /dev/null and zero
lxc.cgroup.devices.allow = c 1:3 rwm
lxc.cgroup.devices.allow = c 1:5 rwm

# consoles
lxc.cgroup.devices.allow = c 5:1 rwm
lxc.cgroup.devices.allow = c 5:0 rwm
lxc.cgroup.devices.allow = c 4:0 rwm
lxc.cgroup.devices.allow = c 4:1 rwm

# /dev/urandom,/dev/random
lxc.cgroup.devices.allow = c 1:9 rwm
lxc.cgroup.devices.allow = c 1:8 rwm

# /dev/pts/* - pts namespaces are "coming soon"
lxc.cgroup.devices.allow = c 136:* rwm
lxc.cgroup.devices.allow = c 5:2 rwm

# tuntap
lxc.cgroup.devices.allow = c 10:200 rwm

# fuse
#lxc.cgroup.devices.allow = c 10:229 rwm

# rtc
#lxc.cgroup.devices.allow = c 254:0 rwm

# standard mount point
#  WARNING: procfs is a known attack vector and should probably be disabled
#           if your userspace allows it. eg. see http://blog.zx2c4.com/749

lxc.mount.entry = proc /var/lib/docker/containers/75..19/rootfs/proc proc nosuid,nodev,noexec 0 0

#  WARNING: sysfs is a known attack vector and should probably be disabled
#           if your userspace allows it. eg. see http://bit.ly/T9CkqJ

lxc.mount.entry = sysfs /var/lib/docker/containers/75..19/rootfs/sys sysfs nosuid,nodev,noexec 0 0

lxc.mount.entry = devpts /var/lib/docker/containers/75..19/rootfs/dev/pts
	devpts newinstance,ptmxmode=0666,nosuid,noexec 0 0

#lxc.mount.entry = varrun /var/lib/docker/containers/75..19/rootfs/var/run
	tmpfs mode=755,size=4096k,nosuid,nodev,noexec 0 0

#lxc.mount.entry = varlock /var/lib/docker/containers/75..19/rootfs/var/lock
	tmpfs size=1024k,nosuid,nodev,noexec 0 0

lxc.mount.entry = shm /var/lib/docker/containers/75..19/rootfs/dev/shm
	tmpfs size=65536k,nosuid,nodev,noexec 0 0

# Inject docker-init
lxc.mount.entry = /usr/bin/docker /var/lib/docker/containers/75..19/rootfs/.dockerinit none bind,ro 0 0

# In order to get a working DNS environment, mount bind (ro) the host's /etc/resolv.conf into the container
lxc.mount.entry = /etc/resolv.conf /var/lib/docker/containers/75..19/rootfs/etc/resolv.conf none bind,ro 0 0

# drop linux capabilities (apply mainly to the user root in the container)
#  (Note: 'lxc.cap.keep' is coming soon and should replace this under the
#         security principle 'deny all unless explicitly permitted', see
#         http://sourceforge.net/mailarchive/message.php?msg_id=31054627 )

lxc.cap.drop = audit_control audit_write mac_admin mac_override mknod setfcap setpcap sys_admin \
               sys_boot sys_module sys_nice sys_pacct sys_rawio sys_resource sys_time sys_tty_config

# limits

I’ll highlite a couple of interesting settings:

Under the section # enable domain name support it mounts the hostname and hosts file read-only.
lxc.cgroup.devices.deny = a denies all the devices, under it, it lists all allowed devices
It mounts procfs, sysfs, devpts, shm
It mounts /etc/resolv.conf to share dns with the host
It drops linux capabilities in the lxc.cap.drop section, in order to harden the container.

Looking further in the directory we have a hostname which nothing more than the name of the host. And we have a hosts file which contains:

127.0.0.1	7554ecfa9e7c
::1		7554ecfa9e7c

127.0.0.1	localhost
::1		localhost ip6-localhost ip6-loopback
fe00::0		ip6-localnet
ff00::0		ip6-mcastprefix
ff02::1		ip6-allnodes
ff02::2		ip6-allrouters

It basically ensures that we can modify the hosts file outside the container.

The RootFS

Still a bit of a mystery on how the rootfs gets created. It’s not visible in the conf file so maybe docker init does something?

In theory i think it does this via AUFS

container/rootfs = /images/my-root-fs/(read-only) + /containers/my-container/rw(read-write)

What is Docker?

Why am I reviewing Docker?

Overview

Let’s get started

The RootFS

Resources