Interpreting whiteout files in Docker image layers

I’ve been playing again with Docker images and their internal layers a little more over the last week — you can see some of my previous adventures at Manipulating Docker images without Docker installed. The general thrust of these adventures is understanding the format and how to manipulate it by building a tool called Occy Strap which can manipulate the format in useful ways. My eventual goal there is to be able to build OCI compliant image bundles and then have a container runtime like runc execute them, and I must say I am getting a lot closer.

This time I was interested in the exact mechanisms used by whiteout files in those layers and how that interacts with Linux kernel overlay filesystem types.

Firstly, what is a whiteout file? Well, when you delete a file or directory from a lower layer in the Docker image, it doesn’t actually get removed from that lower layer, as layers are immutable. Instead, the uppermost layer records that the file or directory has been removed, and it is therefore no longer visible in the Docker image that the container sees. This has obvious security implications if you delete a file like a password you needed during your container build process, although there’s probably better ways to deal with those problems using multi-phase Dockerfiles.

An image might help with the description:

Here we have a container image which is composed of four layers. Layer 1 creates two files, /a and /b. Layer two creates a directory, /c. Layer three deletes /a and creates /c/d. Finally, layer 4 deletes /c and /c/d — let’s assume that it does this by just deleting the /c directory recursively. As far as a container using this image would be concerned, only /b exists in the container image.

A Dockerfile (which wouldn’t actually work) to create this set of history might look like:

FROM scratch
touch /a /b    # Layer 1
mkdir /c       # Layer 2
rm /a          # Layer 3
rm -rf /c      # Layer 4

The Docker image format stores each layer as a tarfile, with that tarfile being what a Linux filesystem called AUFS would have stored for this scenario. AUFS was an early Linux overlay filesystem from around 2006, which never actually mered into the mainline Linux kernel, although it is available on Ubuntu because they maintain a patch. AUFS recorded deletion of a file by creating a “whiteout file”, which was the name of the file prepended with .wh. — so when we deleted /a, AUFS would have created a file named .wh.a in Layer 3. Similarly to recursively delete a directory, it used a whiteout file with the name of the directory.

What if I wanted to replace a directory? AUFS provided an “opaque directory” that ensured that the directory remained, but all of its previous content was hidden. This was done by adding a file in the directory to be made opaque with the name .wh..wh..opq.

You can read quite a lot more about the Docker image format in the specification, as well as the quite interesting documentation on whiteout files.

To finish this example, the contents of the tarfile for each layer should look like this:

# Layer 1
/a                 # a file
/b                 # a file

# Layer 2
/c                 # a directory
/c/.wh..wh..opq.   # a file, created as a safety measure

# Layer 3
/.wh.a             # a file
/c/d               # a file

# Layer 4
/c/.wh.d           # a file
/.wh.c             # a file

So that’s all great, but its not actually what got me bothered. You see, modern Docker users overlayfs, which is the replacement to AUFS which actually made it into the Linux kernel. overlayfs has a similar whiteout mechanism, but it is not the same as the one in AUFS. Specifically deleted files are recorded as character devices with 0/0 device numbers, and deleted directories are recorded with an extended filesystem attribute named “trusted.overlay.opaque” set to “y”. What I wanted to find was the transcode process in Docker which converted the AUFS style tarballs into this in the filesystem while creating a container.

After a bit of digging (the code is in containerd not moby as I expected), the answer is here:

func OverlayConvertWhiteout(hdr *tar.Header, path string) (bool, error) {
	base := filepath.Base(path)
	dir := filepath.Dir(path)

	// if a directory is marked as opaque, we need to translate that to overlay
	if base == whiteoutOpaqueDir {
		// don't write the file itself
		return false, unix.Setxattr(dir, "trusted.overlay.opaque", []byte{'y'}, 0)
	}

	// if a file was deleted and we are using overlay, we need to create a character device
	if strings.HasPrefix(base, whiteoutPrefix) {
		originalBase := base[len(whiteoutPrefix):]
		originalPath := filepath.Join(dir, originalBase)

		if err := unix.Mknod(originalPath, unix.S_IFCHR, 0); err != nil {
			return false, err
		}
		// don't write the file itself
		return false, os.Chown(originalPath, hdr.Uid, hdr.Gid)
	}

	return true, nil
}

Effectively, as a tar file is extracted the whiteout format is transcoded into overlayfs’ format. So there you go.

A final note for implementers of random Docker image tools: the test suite looks quite useful here if you want to validate that what you do matches what Docker does.

Manipulating Docker images without Docker installed

Recently I’ve been playing a bit more with Docker images and Docker image repositories. I had in the past written a quick hack to let me extract files from a Docker image, but I wanted to do something a little more mature than that.

For example, sometimes you want to download an image from a Docker image repository without using Docker. Naively if you had Docker, you’d do something like this:

docker pull busybox
docker save busybox

However, that assumes that you have Docker installed on the machine downloading the images, and that’s sometimes not possible for security reasons. The most obvious example I can think of is airgapped secure environments where you need to walk the data between two networks, and the unclassified network machine doesn’t allow administrator access to install Docker.

So I wrote a little tool to do image manipulation for me. The tool is called Occy Strap, is written in python, and is available on pypi. That means installing it is relatively simple:

python3 -m venv ~/virtualenvs/occystrap
. ~/virtualenvs/occystrap/bin/activate
pip install occystrap

Which doesn’t require administrator permissions. There are then a few things we can do with Occy Strap.

Downloading an image from a repository and storing as a tarball

Let’s say we want to download an image from a repository and store it as a local tarball. This is a common thing to want to do in airgapped environments for example. You could do this with docker with a docker pull; docker save. The Occy Strap equivalent is:

occystrap fetch-to-tarfile registry-1.docker.io library/busybox \
    latest busybox.tar

In this example we’re pulling from the Docker Hub (registry-1.docker.io), and are downloading busybox’s latest version into a tarball named busybox-occy.tar. This tarball can be loaded with docker load -i busybox.tar on an airgapped Docker environment.

Downloading an image from a repository and storing as an extracted tarball

The format of the tarball in the previous example is two JSON configuration files and a series of image layers as tarballs inside the main tarball. You can write these elements to a directory instead of to a tarball if you’d like to inspect them. For example:

occystrap fetch-to-extracted registry-1.docker.io library/centos 7 \
    centos7

This example will pull from the Docker Hub the Centos image with the label “7”, and write the content to a directory in the current working directory called “centos7”. If you tarred centos7 like this, you’d end up with a tarball equivalent to what fetch-to-tarfile produces, which could therefore be loaded with docker load:

cd centos7; tar -cf ../centos7.tar *

Downloading an image from a repository and storing it in a merged directory

In scenarios where image layers are likely to be reused between images (for example many images which share a common base layer), you can save disk space by downloading images to a directory which contains more than one image. To make this work, you need to instruct Occy Strap to use unique names for the JSON elements within the image file:

occystrap fetch-to-extracted --use-unique-names registry-1.docker.io \ 
    homeassistant/home-assistant latest merged_images
occystrap fetch-to-extracted --use-unique-names registry-1.docker.io \ 
    homeassistant/home-assistant stable merged_images
occystrap fetch-to-extracted --use-unique-names registry-1.docker.io \ 
    homeassistant/home-assistant 2021.3.0.dev20210219 merged_images

Each of these images include 21 layers, but the merged_images directory at the time of writing this there are 25 unique layers in the directory. You end up with a layout like this:

0465ae924726adc52c0216e78eda5ce2a68c42bf688da3f540b16f541fd3018c
10556f40181a651a72148d6c643ac9b176501d4947190a8732ec48f2bf1ac4fb
...
catalog.json 
cd8d37c8075e8a0195ae12f1b5c96fe4e8fe378664fc8943f2748336a7d2f2f3 
d1862a2c28ec9e23d88c8703096d106e0fe89bc01eae4c461acde9519d97b062 
d1ac3982d662e038e06cc7e1136c6a84c295465c9f5fd382112a6d199c364d20.json 
... 
d81f69adf6d8aeddbaa1421cff10ba47869b19cdc721a2ebe16ede57679850f0.json 
...
manifest-homeassistant_home-assistant-2021.3.0.dev20210219.json 
manifest-homeassistant_home-assistant-latest.json manifest-
homeassistant_home-assistant-stable.json

catalog.json is an Occy Strap specific artefact which maps which layers are used by which image. Each of the manifest files for the various images have been converted to have a unique name instead of manifest.json as well.

To extract a single image from such a shared directory, use the recreate-image command:

occystrap recreate-image merged_images homeassistant/home-assistant \
    latest ha-latest.tar

Exploring the contents of layers and overwritten files

Similarly, if you’d like the layers to be expanded from their tarballs to the filesystem, you can pass the --expand argument to fetch-to-extracted to have them extracted. This will also create a filesystem at the name of the manifest which is the final state of the image (the layers applied sequential). For example:

occystrap fetch-to-extracted --expand quay.io \ 
    ukhomeofficedigital/centos-base latest ukhomeoffice-centos

Note that layers delete files from previous layers with files named “.wh.$previousfilename”. These files are not processed in the expanded layers, so that they are visible to the user. They are however processed in the merged layer named for the manifest file.

Quick hack: extracting the contents of a Docker image to disk

Hello! Please note I’ve written a little python tool called Occy Strap which makes this a bit easier, and can do some fancy things around importing and exporting multiple images. You might want to read about it?

For various reasons, I wanted to inspect the contents of a Docker image without starting a container. Docker makes it easy to get an image as a tar file, like this:

docker save -o foo.tar image

But if you extract that tar file you’ll find a configuration file and manifest as JSON files, and then a series of tar files, one per image layer. You use the manifest to determine in what order you extract the tar files to build the container filesystem.

That’s fiddly and annoying. So I wrote this quick python hack to extract an image tarball into a directory on disk that I could inspect:

#!/usr/bin/python3

# Call me like this:
#  docker-image-extract tarfile.tar extracted

import tarfile
import json
import os
import sys

image_path = sys.argv[1]
extracted_path = sys.argv[2]

image = tarfile.open(image_path)
manifest = json.loads(image.extractfile('manifest.json').read())

for layer in manifest[0]['Layers']:
    print('Found layer: %s' % layer)
    layer_tar = tarfile.open(fileobj=image.extractfile(layer))

    for tarinfo in layer_tar:
        print('  ... %s' % tarinfo.name)
        if tarinfo.isdev():
            print('  --> skip device files')
            continue

        dest = os.path.join(extracted_path, tarinfo.name)
        if not tarinfo.isdir() and os.path.exists(dest):
            print('  --> remove old version of file')
            os.unlink(dest)

        layer_tar.extract(tarinfo, path=extracted_path)

Hopefully that’s useful to someone else (or future me).

Configuring docker to use rexray and Ceph for persistent storage

For various reasons I wanted to play with docker containers backed by persistent Ceph storage. rexray seemed like the way to do that, so here are my notes on getting that working…

First off, I needed to install rexray:

    root@labosa:~/rexray# curl -sSL https://dl.bintray.com/emccode/rexray/install | sh
    Selecting previously unselected package rexray.
    (Reading database ... 177547 files and directories currently installed.)
    Preparing to unpack rexray_0.9.0-1_amd64.deb ...
    Unpacking rexray (0.9.0-1) ...
    Setting up rexray (0.9.0-1) ...
    
    rexray has been installed to /usr/bin/rexray
    
    REX-Ray
    -------
    Binary: /usr/bin/rexray
    Flavor: client+agent+controller
    SemVer: 0.9.0
    OsArch: Linux-x86_64
    Branch: v0.9.0
    Commit: 2a7458dd90a79c673463e14094377baf9fc8695e
    Formed: Thu, 04 May 2017 07:38:11 AEST
    
    libStorage
    ----------
    SemVer: 0.6.0
    OsArch: Linux-x86_64
    Branch: v0.9.0
    Commit: fa055d6da595602715bdfd5541b4aa6d4dcbcbd9
    Formed: Thu, 04 May 2017 07:36:11 AEST
    

Which is of course horrid. What that script seems to have done is install a deb’d version of rexray based on an alien’d package:

    root@labosa:~/rexray# dpkg -s rexray
    Package: rexray
    Status: install ok installed
    Priority: extra
    Section: alien
    Installed-Size: 36140
    Maintainer: Travis CI User <travis@testing-gce-7fbf00fc-f7cd-4e37-a584-810c64fdeeb1>
    Architecture: amd64
    Version: 0.9.0-1
    Depends: libc6 (>= 2.3.2)
    Description: Tool for managing remote & local storage.
     A guest based storage introspection tool that
     allows local visibility and management from cloud
     and storage platforms.
     .
     (Converted from a rpm package by alien version 8.86.)
    

If I was building anything more than a test environment I think I’d want to do a better job of installing rexray than this, so you’ve been warned.

Next to configure rexray to use Ceph. The configuration details are cunningly hidden in the libstorage docs, and aren’t mentioned at all in the rexray docs, so you probably want to take a look at the libstorage docs on ceph. First off, we need to install the ceph tools, and copy the ceph authentication information from the the ceph we installed using openstack-ansible earlier.

    root@labosa:/etc# apt-get install ceph-common
    root@labosa:/etc# scp -rp 172.29.239.114:/etc/ceph .
    The authenticity of host '172.29.239.114 (172.29.239.114)' can't be established.
    ECDSA key fingerprint is SHA256:SA6U2fuXyVbsVJIoCEHL+qlQ3xEIda/MDOnHOZbgtnE.
    Are you sure you want to continue connecting (yes/no)? yes
    Warning: Permanently added '172.29.239.114' (ECDSA) to the list of known hosts.
    rbdmap                       100%   92     0.1KB/s   00:00
    ceph.conf                    100%  681     0.7KB/s   00:00
    ceph.client.admin.keyring    100%   63     0.1KB/s   00:00
    ceph.client.glance.keyring   100%   64     0.1KB/s   00:00
    ceph.client.cinder.keyring   100%   64     0.1KB/s   00:00
    ceph.client.cinder-backup.keyring   71     0.1KB/s   00:00
    root@labosa:/etc# modprobe rbd
    

You also need to configure rexray. My first attempt looked like this:

    root@labosa:/var/log# cat /etc/rexray/config.yml
    libstorage:
      service: ceph
    

And the rexray output sure made it look like it worked…

    root@labosa:/etc# rexray service start
    ● rexray.service - rexray
       Loaded: loaded (/etc/systemd/system/rexray.service; enabled; vendor preset: enabled)
       Active: active (running) since Mon 2017-05-29 10:14:07 AEST; 33ms ago
     Main PID: 477423 (rexray)
        Tasks: 5
       Memory: 1.5M
          CPU: 9ms
       CGroup: /system.slice/rexray.service
               └─477423 /usr/bin/rexray start -f
    
    May 29 10:14:07 labosa systemd[1]: Started rexray.
    

Which looked good, but /var/log/syslog said:

    May 29 10:14:08 labosa rexray[477423]: REX-Ray
    May 29 10:14:08 labosa rexray[477423]: -------
    May 29 10:14:08 labosa rexray[477423]: Binary: /usr/bin/rexray
    May 29 10:14:08 labosa rexray[477423]: Flavor: client+agent+controller
    May 29 10:14:08 labosa rexray[477423]: SemVer: 0.9.0
    May 29 10:14:08 labosa rexray[477423]: OsArch: Linux-x86_64
    May 29 10:14:08 labosa rexray[477423]: Branch: v0.9.0
    May 29 10:14:08 labosa rexray[477423]: Commit: 2a7458dd90a79c673463e14094377baf9fc8695e
    May 29 10:14:08 labosa rexray[477423]: Formed: Thu, 04 May 2017 07:38:11 AEST
    May 29 10:14:08 labosa rexray[477423]: libStorage
    May 29 10:14:08 labosa rexray[477423]: ----------
    May 29 10:14:08 labosa rexray[477423]: SemVer: 0.6.0
    May 29 10:14:08 labosa rexray[477423]: OsArch: Linux-x86_64
    May 29 10:14:08 labosa rexray[477423]: Branch: v0.9.0
    May 29 10:14:08 labosa rexray[477423]: Commit: fa055d6da595602715bdfd5541b4aa6d4dcbcbd9
    May 29 10:14:08 labosa rexray[477423]: Formed: Thu, 04 May 2017 07:36:11 AEST
    May 29 10:14:08 labosa rexray[477423]: time="2017-05-29T10:14:08+10:00" level=error
    msg="error starting libStorage server" error.driver=ceph time=1496016848215
    May 29 10:14:08 labosa rexray[477423]: time="2017-05-29T10:14:08+10:00" level=error
    msg="default module(s) failed to initialize" error.driver=ceph time=1496016848216
    May 29 10:14:08 labosa rexray[477423]: time="2017-05-29T10:14:08+10:00" level=error
    msg="daemon failed to initialize" error.driver=ceph time=1496016848216
    May 29 10:14:08 labosa rexray[477423]: time="2017-05-29T10:14:08+10:00" level=error
    msg="error starting rex-ray" error.driver=ceph time=1496016848216
    

That’s because the service is called rbd it seems. So, the config file ended up looking like this:

    root@labosa:/var/log# cat /etc/rexray/config.yml
    libstorage:
      service: rbd
    
    rbd:
      defaultPool: rbd
    

Now to install docker:

    root@labosa:/var/log# sudo apt-get update
    root@labosa:/var/log# sudo apt-get install linux-image-extra-$(uname -r) \
        linux-image-extra-virtual
    root@labosa:/var/log# sudo apt-get install apt-transport-https \
        ca-certificates curl software-properties-common
    root@labosa:/var/log# curl -fsSL https://download.docker.com/linux/ubuntu/gpg | sudo apt-key add -
    root@labosa:/var/log# sudo add-apt-repository \
        "deb [arch=amd64] https://download.docker.com/linux/ubuntu \
        $(lsb_release -cs) \
        stable"
    root@labosa:/var/log# sudo apt-get update
    root@labosa:/var/log# sudo apt-get install docker-ce
    

Now let’s make a rexray volume.

    root@labosa:/var/log# rexray volume ls
    ID  Name  Status  Size
    root@labosa:/var/log# docker volume create --driver=rexray --name=mysql \
        --opt=size=1
    A size of 1 here means 1gb
    mysql
    root@labosa:/var/log# rexray volume ls
    ID         Name   Status     Size
    rbd.mysql  mysql  available  1
    

Let’s start the container.

    root@labosa:/var/log# docker run --name some-mysql --volume-driver=rexray \
        -v mysql:/var/lib/mysql -e MYSQL_ROOT_PASSWORD=my-secret-pw -d mysql
    Unable to find image 'mysql:latest' locally
    latest: Pulling from library/mysql
    10a267c67f42: Pull complete
    c2dcc7bb2a88: Pull complete
    17e7a0445698: Pull complete
    9a61839a176f: Pull complete
    a1033d2f1825: Pull complete
    0d6792140dcc: Pull complete
    cd3adf03d6e6: Pull complete
    d79d216fd92b: Pull complete
    b3c25bdeb4f4: Pull complete
    02556e8f331f: Pull complete
    4bed508a9e77: Pull complete
    Digest: sha256:2f4b1900c0ee53f344564db8d85733bd8d70b0a78cd00e6d92dc107224fc84a5
    Status: Downloaded newer image for mysql:latest
    ccc251e6322dac504e978f4b95b3787517500de61eb251017cc0b7fd878c190b
    

And now to prove that persistence works and that there’s nothing up my sleeve…


    root@labosa:/var/log# docker run -it --link some-mysql:mysql --rm mysql \
        sh -c 'exec mysql -h"$MYSQL_PORT_3306_TCP_ADDR" \
        -P"$MYSQL_PORT_3306_TCP_PORT" -uroot -p"$MYSQL_ENV_MYSQL_ROOT_PASSWORD"'
    mysql: [Warning] Using a password on the command line interface can be insecure.
    Welcome to the MySQL monitor.  Commands end with ; or \g.
    Your MySQL connection id is 3
    Server version: 5.7.18 MySQL Community Server (GPL)
    
    Copyright (c) 2000, 2017, Oracle and/or its affiliates. All rights reserved.
    
    Oracle is a registered trademark of Oracle Corporation and/or its
    affiliates. Other names may be trademarks of their respective
    owners.
    
    Type 'help;' or '\h' for help. Type '\c' to clear the current input statement.
    
    mysql> show databases;
    +--------------------+
    | Database           |
    +--------------------+
    | information_schema |
    | mysql              |
    | performance_schema |
    | sys                |
    +--------------------+
    4 rows in set (0.00 sec)
    
    mysql> create database demo;
    Query OK, 1 row affected (0.03 sec)
    
    mysql> use demo;
    Database changed
    mysql> create table foo(val char(5));
    Query OK, 0 rows affected (0.14 sec)
    
    mysql> insert into foo(val) values ('a'), ('b'), ('c');
    Query OK, 3 rows affected (0.08 sec)
    Records: 3  Duplicates: 0  Warnings: 0
    
    mysql> select * from foo;
    +------+
    | val  |
    +------+
    | a    |
    | b    |
    | c    |
    +------+
    3 rows in set (0.00 sec)
    

Now let’s re-create the container and prove the data remains.

    root@labosa:/var/log# docker stop some-mysql
    some-mysql
    root@labosa:/var/log# docker rm some-mysql
    some-mysql
    root@labosa:/var/log# docker run --name some-mysql --volume-driver=rexray \
        -v mysql:/var/lib/mysql -e MYSQL_ROOT_PASSWORD=my-secret-pw -d mysql
    99a7ccae1ad1865eb1bcc8c757251903dd2f1ac7d3ce4e365b5cdf94f539fe05
    
    root@labosa:/var/log# docker run -it --link some-mysql:mysql --rm mysql \
        sh -c 'exec mysql -h"$MYSQL_PORT_3306_TCP_ADDR" -\
        P"$MYSQL_PORT_3306_TCP_PORT" -uroot -p"$MYSQL_ENV_MYSQL_ROOT_PASSWORD"'
    mysql: [Warning] Using a password on the command line interface can be insecure.
    Welcome to the MySQL monitor.  Commands end with ; or \g.
    Your MySQL connection id is 3
    Server version: 5.7.18 MySQL Community Server (GPL)
    
    Copyright (c) 2000, 2017, Oracle and/or its affiliates. All rights reserved.
    
    Oracle is a registered trademark of Oracle Corporation and/or its
    affiliates. Other names may be trademarks of their respective
    owners.
    
    Type 'help;' or '\h' for help. Type '\c' to clear the current input statement.
    
    mysql> use demo;
    Reading table information for completion of table and column names
    You can turn off this feature to get a quicker startup with -A
    
    Database changed
    mysql> select * from foo;
    +------+
    | val  |
    +------+
    | a    |
    | b    |
    | c    |
    +------+
    3 rows in set (0.00 sec)
    

So there you go.