Quick hack: extracting the contents of a Docker image to disk

Share

For various reasons, I wanted to inspect the contents of a Docker image without starting a container. Docker makes it easy to get an image as a tar file, like this:

docker save -o foo.tar image

But if you extract that tar file you’ll find a configuration file and manifest as JSON files, and then a series of tar files, one per image layer. You use the manifest to determine in what order you extract the tar files to build the container filesystem.

That’s fiddly and annoying. So I wrote this quick python hack to extract an image tarball into a directory on disk that I could inspect:

#!/usr/bin/python3

# Call me like this:
#  docker-image-extract tarfile.tar extracted

import tarfile
import json
import os
import sys

image_path = sys.argv[1]
extracted_path = sys.argv[2]

image = tarfile.open(image_path)
manifest = json.loads(image.extractfile('manifest.json').read())

for layer in manifest[0]['Layers']:
    print('Found layer: %s' % layer)
    layer_tar = tarfile.open(fileobj=image.extractfile(layer))

    for tarinfo in layer_tar:
        print('  ... %s' % tarinfo.name)
        if tarinfo.isdev():
            print('  --> skip device files')
            continue

        dest = os.path.join(extracted_path, tarinfo.name)
        if not tarinfo.isdir() and os.path.exists(dest):
            print('  --> remove old version of file')
            os.unlink(dest)

        layer_tar.extract(tarinfo, path=extracted_path)

Hopefully that’s useful to someone else (or future me).

Share

Leave a Reply

This site uses Akismet to reduce spam. Learn how your comment data is processed.