Juno nova mid-cycle meetup summary: DB2 support

This post is one part of a series discussing the OpenStack Nova Juno mid-cycle meetup. It’s a bit shorter than most of the others, because the next thing on my list to talk about is DB2, and that’s relatively contained.

IBM is interested in adding DB2 support as a SQL database for Nova. Theoretically, this is a relatively simple thing to do because we use SQLAlchemy to abstract away the specifics of the SQL engine. However, in reality, the abstraction is leaky. The obvious example in this case is that DB2 has different rules for foreign keys than other SQL engines we’ve used. So, in order to be able to make this change, we need to tighten up our schema for the database.

The change that was discussed is the requirement that the UUID column on the instances table be not null. This seems like a relatively obvious thing to allow, given that UUID is the official way to identify an instance, and has been for a really long time. However, there are a few things which make this complicated: we need to understand the state of databases that might have been through a long chain of upgrades from previous Nova releases, and we need to ensure that the schema alterations don’t cause significant performance problems for existing large deployments.

As an aside, people sometimes complain that Nova development is too slow these days, and they’re probably right, because things like this slow us down. A relatively simple change to our database schema requires a whole bunch of performance testing and negotiation with operators to ensure that its not going to be a problem for people. It’s good that we do these things, but sometimes it’s hard to explain to people why forward progress is slow in these situations.

Matt Riedemann from IBM has been doing a good job of handling this change. He’s written a tool that operators can run before the change lands in Juno that checks if they have instance rows with null UUIDs. Additionally, the upgrade process has been well planned, and is documented in the specification available on the fancy pants new specs website.

We had a long discussion about this change at the meetup, and how it would impact on large deployments. Both Rackspace and HP were asked if they could run performance tests to see if the schema change would be a problem for them. Unfortunately HP’s testing hardware was tied up with another project, so we only got numbers from Rackspace. For them, the schema change took 42 minutes for a large database. Almost all of that was altering the column to be non-nullable; creating the new index was only 29 seconds of runtime. However, the Rackspace database is large because they don’t currently purge deleted rows, if they can get that done before running this schema upgrade then the impact will be much smaller.

So the recommendation here for operators is that it is best practice to purge deleted rows from your databases before an upgrade, especially when schema migrations need to occur at the same time. There are some other takeaways for operators as well: if we know that operators have a large deployment, then we can ask if an upgrade will be a problem. This is why being active on the openstack-operators mailing list is important. Additionally, if operators are willing to donate a dataset to Turbo-Hipster for DB CI testing, then we can use that in our automation to try and make sure these upgrades don’t cause you pain in the future.

In the next post in this series I’ll talk about the future of cells, and the work that needs to be done there to make it a first class citizen.

Exploring a single database migration

Yesterday I was having some troubles with a database migration download step, and a Joshua Hesketh suggested I step through the migrations one at a time and see what they were doing to my sqlite test database. That’s a great idea, but it wasn’t immediately obvious to me how to do it. Now that I’ve figured out the steps required, I thought I’d document them here.

First off we need a test environment. I’m hacking on nova at the moment, and tend to build throw away test environments in the cloud because its cheap and easy. So, I created a new Ubuntu 12.04 server instance in Rackspace’s Sydney data center, and then configured it like this:

    $ sudo apt-get update
    $ sudo apt-get install -y git python-pip git-review libxml2-dev libxml2-utils
    libxslt-dev libmysqlclient-dev pep8 postgresql-server-dev-9.1 python2.7-dev
    python-coverage python-netaddr python-mysqldb python-git virtualenvwrapper
    python-numpy virtualenvwrapper sqlite3
    $ source /etc/bash_completion.d/virtualenvwrapper
    $ mkvirtualenv migrate_204
    $ toggleglobalsitepackages

Simple! I should note here that we probably don’t need the virtualenv because this machine is disposable, but its still a good habit to be in. Now I need to fetch the code I am testing. In this case its from my personal fork of nova, and the git location to fetch will obviously change for other people:

    $ git clone http://github.com/mikalstill/nova

Now I can install the code under test. This will pull in a bunch of pip dependencies as well, so it takes a little while:

    $ cd nova
    $ python setup.py develop

Next we have to configure nova because we want to install specific database schema versions.

    $ mkdir /etc/nova
    $ sudo mkdir /etc/nova
    $ sudo vim /etc/nova/nova.conf
    $ sudo chmod -R ugo+rx /etc/nova

The contents of my nova.conf looks like this:

    $ cat /etc/nova/nova.conf
    sql_connection = sqlite:////tmp/foo.sqlite

Now I can step up to the version before the one I am testing:

    $ nova-manage db sync --version 203

You do the same thing but with a different version number to step somewhere else. Its also pretty easy to get the schema for a table under sqlite. I just do this:

    $ sqlite3 /tmp/foo.sqlite
    SQLite version 3.7.9 2011-11-01 00:52:41
    Enter ".help" for instructions
    Enter SQL statements terminated with a ";"
    sqlite> .schema instances
    CREATE TABLE "instances" (
            created_at DATETIME,
            updated_at DATETIME,

So there you go.

Disclaimer — I wouldn’t recommend upgrading to a specific version like this for real deployments, because the models in the code base wont match the tables. If you wanted to do that you’d need to work out what git commit added the version after the one you’ve installed, and then checkout the commit before that commit.

Nova database continuous integration

I’ve had some opportunity recently to spend a little quality time off line, and I spent some of that time working on a side project I’ve wanted to do for a while — continuous integration testing of nova database migrations. Now, the code isn’t perfect at the moment, but I think its an interesting direction to take and I will keep pursuing it.

One of the problems nova developers have is that we don’t have a good way of determining whether a database migration will be painful for deployers. We can eyeball code reviews, but whether code looks reasonable or not, its still hard to predict how it will perform on real data. Continuous integration is the obvious solution — if we could test patch sets on real databases as part of the code review process, then reviewers would have more data about whether to approve a patch set or not. So I did that.

At the moment the CI implementation I’ve built isn’t posting to code reviews, but that’s because I want to be confident that the information it gathers is accurate before wasting other reviewers’ time. You can see results at openstack.stillhq.com/ci. For now, I am keeping an eye on the test results and posting manually to reviews when an error is found — that has happened twice so far.

The CI tests work by restoring a MySQL database to a known good state, upgrading that database from Folsom to Grizzly (if needed). It then runs the upgrades already committed to trunk, and then the proposed patch set. Timings for each step are reported — for example with my biggest test database the upgrade from Folsom to Grizzly takes between about 7 and 9 minutes to run, which isn’t too bad. You can see an example log at here.

I’d be interested in know if anyone else has sample databases they’d like to see checks run against. If so, reach out to me and we can make it happen.

A historical note from November 2020: the links in this post to the OpenStack CI system have been broken for quite some time and have therefore been removed.

MythBuntu 8.10 just made me sad

I figured it was time to give MythBuntu a try, so I set up a MythBuntu 8.10 instance in VirtualBox today. That was a mistake. I’m not 100% sure I understand how it happened, but MythBuntu somehow managed to delete my entire mythconverg MySQL database instance. Not pleased. I’ve restored it from last night’s backup, but now I’ll need to recover recordings which happened today, assuming I can be bothered.

I’m writing this just as a warning to others — if you’re playing with MythBuntu, backup your MySQL instance if its not a test one.

Recovering lost MythTV recordings

I had one of those moments tonight, and accidentally dropped the mythconverg database on my production MythTV instance, not the development one. This made me sad. Luckily I had a backup which was only a week old (although I am now running night backups of that database).

Recovery wasn’t too bad once I wrote some code. The steps:

  • Restore from backup
  • Don’t run mythfilldatabase (it will clear out old guide data, and we need it)
  • Apply my funky patch to myth.rebuilddatabase.pl
  • Run myth.rebuilddatabase.pl
  • Run mythfilldatabase

And all is well again. The patch uses the guide data from the database to make an educated guess about the title, subtitle and description of the recordings which are missing from the database. Here’s the patch:

    Index: myth.rebuilddatabase.pl
    --- myth.rebuilddatabase.pl     (revision 11681)
    +++ myth.rebuilddatabase.pl     (working copy)
    @@ -185,6 +185,7 @@
    +print "db = dbi:mysql:database=$database:host=$dbhost user = $user pass = $pass\n";
     my $dbh = DBI->connect("dbi:mysql:database=$database:host=$dbhost",
                    "$user","$pass") or die "Cannot connect to database ($!)\n";
    @@ -314,6 +315,7 @@
         # have enough to look for an past recording?
         if ($ssecond) {
    +        print "Checking for a recording...\n";
             $starttime = "$syear$smonth$sday$shour$sminute$ssecond";
             my $guess = "select title, subtitle, description from oldrecorded where chanid=(?) and starttime=(?)";
    @@ -333,6 +335,24 @@
             print "End time:   $emonth/$eday/$eyear - $ehour:$eminute:$esecond\n";
    +    # what about checking for guide data?
    +    if($guess_description =~ /^Recovered file/) {
    +        print "Checking for guide data...\n";
    +        my $guess = "select title, subtitle, description from program where chanid='$channel'".
    +                    " and starttime='$syear-$smonth-$sday $shour:$sminute:$ssecond'";
    +        print "$guess\n";
    +        $sth = $dbh->prepare($guess);
    +        $sth->execute()
    +            or die "Could not execute ($guess)\n";
    +        if (my @row = $sth->fetchrow_array) {
    +            $guess_title = $row[0];
    +            $guess_subtitle = $row[1];
    +            $guess_description = $row[2];
    +            print "Using guide data informaton for defaults\n";
    +        }
    +    }
         my $newtitle = $guess_title;
         my $newsubtitle = $guess_subtitle;
         my $newdescription = $guess_description;

You can download the patch here or read the bug report.