Further adventures with base images in OpenStack

I was bored over the New Years weekend, so I figured I’d have a go at implementing image cache management as discussed previously. I actually have an implementation of about 75% of that blueprint now, but its not ready for prime time yet. The point of this post is more to document some stuff I learnt about VM startup along the way so I don’t forget it later.

So, you want to start a VM on a compute node. Once the scheduler has selected a node to run the VM on, the next step is the compute instance on that machine starting the VM up. First the specified disk image is fetched from your image service (in my case glance), and placed in a temporary location on disk. If the image is already a raw image, it is then renamed to the correct name in the instances/_base directory. If it isn’t a raw image then it is converted to raw format, and that converted file is put in the right place. Optionally, the image can be extended to a specified size as part of this process.

Then, depending on if you have copy on write (COW) images turned on or not, either a COW version of the file is created inside the instances/$instance/ directory, or the file from _base is copied to instances/$instance.

This has a side effect that had me confused for a bunch of time yesterday — the checksums, and even file sizes, stored in glance are not reliable indicators of base image corruption. Most of my confusion was because image files in glance are immutable, so how come they differed from what’s on disk? The other problem was that the images I was using on my development machine were raw images, and checksums did work. It was only when I moved to a slightly more complicated environment that I had enough data to work out what was happening.

We therefore have a problem for that blueprint. We can’t use the checksums from glance as a reliable indicator of if something has gone wrong with the base image. I need to come up with something nicer. What this probably means for the first cut of the code is that checksums will only be verified for raw images which weren’t extended, but I haven’t written that code yet.

So, there we go.

Openstack compute node cleanup

I’ve never used openstack before, which I imagine is similar to many other people out there. Its actually pretty cool, although I encountered a problem the other day that I think is worthy of some more documentation. Openstack runs virtual machines for users, in much the same manner as Amazon’s EC2 system. These instances are started with a base image, and then copy on write is used to write differences for the instance as it changes stuff. This makes sense in a world where a given machine might be running more than one copy of the instance.

However, I encountered a compute node which was running low on disk. This is because there is currently nothing which cleans up these base images, so even if none of the instances on a machine require that image, and even if the machine is experiencing disk stress, the images still hang around. There are a few blog posts out there about this, but nothing really definitive that I could find. I’ve filed a bug asking for the Ubuntu package to include some sort of cleanup script, and interestingly that led me to learn that there are plans for a pretty comprehensive image management system. Unfortunately, it doesn’t seem that anyone is working on this at the moment. I would offer to lend a hand, but its not clear to me as an openstack n00b where I should start. If you read this and have some pointers, feel free to contact me.

Anyways, we still need to cleanup that node experiencing disk stress. It turns out that nova uses qemu for its copy on write disk images. We can therefore ask qemu which are in use. It goes something like this:

    $ cd /var/lib/nova/instances
    $ find -name "disk*" | xargs -n1 qemu-img info | grep backing | \
      sed -e's/.*file: //' -e 's/ .*//' | sort | uniq > /tmp/inuse
    

/tmp/inuse will now contain a list of the images in _base that are in use at the moment. Now you can change to the base directory, which defaults to /var/lib/nova/instances/_base and do some cleanup. What I do is I look for large image files which are several days old. I then check if they appear in that temporary file I created, and if they don’t I delete them.

I’m sure that this could be better automated by a simple python script, but I haven’t gotten around to it yet. If I do, I will be sure to mention it here.

Color ebook!

By far the most consistent criticism of The Definitive Guide to ImageMagick has been that the sample images need to be in color. I would have to agree with this point, which is why I am delighted that Apress took the time to go back around the production process and produce a version of the ebook with color images. It’s cool that they were willing to put in the effort, and not only that, they’re giving anyone who has purchased the ebook to date a free upgrade. Even better, now if you buy the printed book on Amazon, you get the color ebook for free!

I have a limited number of color ebooks to give away, so if you’re interested please leave a comment and explain why you’d like one.

Ruby sample source code

This is the source code for the imwizard application demonstrated in
Chapter 10 of ImageMagick, the Definitive Guide. This is a
small Ruby script which demonstrates how to build an interactive
program which builds a list of commands to apply to images, and then
applies those commands to many images as specified by a regular
expression.

  • Extracted source
  • Tarball
  • Zip file
  • ImageMagick book – Chapter 2: Basic Image Manipulation

    I’m meant to be writing the rest of chapter seven tonight, but I thought I would warm up by continuing with my promised series of posts about the content of the book. The next chapter in the list is chapter two, which covers simple image manipulations. The idea was to get the stuff which everyone wants to do and cover it as soon as possible so that people can get some runs on the board (so to speak). In chapter two you will find an introduction to the bits of imaging theory that we need for the book (rasters, vectors, bitmaps, pixels, you get the idea).

    Then I move on to talk about ways to change the size of images. This includes resizing, sampling cropping, scaling, thumb-nailing and so forth. We also discuss some interesting transformations like trim. Then we move onto making an image larger, before finishing up with how to process many images at once with ImageMagick.

    It’s an interesting chapter in that it’s immediately useful, and goes through some interesting theory matters. It also sets the stage for the later coverage of all the other cool stuff you can do with ImageMagick. As a point of interest, this is also the chapter I wrote to determine how long it takes to write a chapter, which was an interesting experience.

    Anyways, on with chapter seven me thinks.