Juno nova mid-cycle meetup summary: DB2 support

This post is one part of a series discussing the OpenStack Nova Juno mid-cycle meetup. It’s a bit shorter than most of the others, because the next thing on my list to talk about is DB2, and that’s relatively contained.

IBM is interested in adding DB2 support as a SQL database for Nova. Theoretically, this is a relatively simple thing to do because we use SQLAlchemy to abstract away the specifics of the SQL engine. However, in reality, the abstraction is leaky. The obvious example in this case is that DB2 has different rules for foreign keys than other SQL engines we’ve used. So, in order to be able to make this change, we need to tighten up our schema for the database.

The change that was discussed is the requirement that the UUID column on the instances table be not null. This seems like a relatively obvious thing to allow, given that UUID is the official way to identify an instance, and has been for a really long time. However, there are a few things which make this complicated: we need to understand the state of databases that might have been through a long chain of upgrades from previous Nova releases, and we need to ensure that the schema alterations don’t cause significant performance problems for existing large deployments.

As an aside, people sometimes complain that Nova development is too slow these days, and they’re probably right, because things like this slow us down. A relatively simple change to our database schema requires a whole bunch of performance testing and negotiation with operators to ensure that its not going to be a problem for people. It’s good that we do these things, but sometimes it’s hard to explain to people why forward progress is slow in these situations.

Matt Riedemann from IBM has been doing a good job of handling this change. He’s written a tool that operators can run before the change lands in Juno that checks if they have instance rows with null UUIDs. Additionally, the upgrade process has been well planned, and is documented in the specification available on the fancy pants new specs website.

We had a long discussion about this change at the meetup, and how it would impact on large deployments. Both Rackspace and HP were asked if they could run performance tests to see if the schema change would be a problem for them. Unfortunately HP’s testing hardware was tied up with another project, so we only got numbers from Rackspace. For them, the schema change took 42 minutes for a large database. Almost all of that was altering the column to be non-nullable; creating the new index was only 29 seconds of runtime. However, the Rackspace database is large because they don’t currently purge deleted rows, if they can get that done before running this schema upgrade then the impact will be much smaller.

So the recommendation here for operators is that it is best practice to purge deleted rows from your databases before an upgrade, especially when schema migrations need to occur at the same time. There are some other takeaways for operators as well: if we know that operators have a large deployment, then we can ask if an upgrade will be a problem. This is why being active on the openstack-operators mailing list is important. Additionally, if operators are willing to donate a dataset to Turbo-Hipster for DB CI testing, then we can use that in our automation to try and make sure these upgrades don’t cause you pain in the future.

In the next post in this series I’ll talk about the future of cells, and the work that needs to be done there to make it a first class citizen.

MySQL Tech Talks

Three intrepid MySQLers came to Google after the user conference to give internal tech talks. They were kind enough to agree to us hosting them for other people to see. The first two are up, so I’ll mention those now, and put a link to the last one when it’s available…

Click on the thumbnail to be taken to the video.

Jay Pipes is a co-author of the recently published Pro MySQL (Apress, 2005), which covers all of the newest MySQL 5 features, as well as in-depth discussion and analysis of the MySQL server architecture, storage engines, transaction processing, benchmarking, and advanced SQL scenarios. You can also see his name on articles appearing in Linux Magazine and can read more articles about MySQL at his website. Jay Pipes is MySQL’s Community Relations Manager for North America.

Learn where to best focus your attention when tuning the performance of your applications and database servers, and how to effectively find the “low hanging fruit” on the tree of bottlenecks. It’s not rocket science, but with a bit of acquired skill and experience, and of course good habits, you too can do this magic!

Timour Katchaounov

The first part of this talk describes the main principles behind MySQL’s query optimiser and execution engine, how the optimiser transforms queries into executable query plans, what these plans look like, and how they are executed.

The second part of the talk describes the major improvements in the query engine of MySQL 5.0, and how these improvements can benefit the users of MySQL 5.0. The “greedy” optimiser reduces compilation time for big queries with orders of magnitude. The “index merge” access method provides a way to use more than one index for the same query. For faster plan execution and to allow better join orders, the 5.0 optimiser transforms most outer joins into inner joins.

The outer joins that cannot be transformed into inner ones are executed in a pipeline manner, so that no intermediate results need to be materialised. Finally, some GROUP BY and DISTINCT queries can be executed much faster thanks to “loose index scan” technique that reads only a fraction of an index.

The talk concludes with the near-future plans for new features coming in the next versions of MySQL.

Stewart Smith works for MySQL AB as a software engineer working on MySQL Cluster. He is an active member of the free and open source software community, especially in Australia. Although Australian, he does not dress like Steve Irwin—although if he wrestled crocodiles he probably would. He is a fan of great coffee, great beer, and is currently 39,000 feet above sea level.

part 1 – Introduction to MySQL Cluster The NDB storage engine (MySQL Cluster) is a high-availability storage engine for MySQL. It provides synchronous replication between storage nodes and many mysql servers having a consistent view of the database. In 4.1 and 5.0 it’s a main memory database, but in 5.1 non-indexed attributes can be stored on disk. NDB also provides a lot of determinism in system resource usage. I’ll talk a bit about that.

part 2 – new features in 5.1 including cluster to cluster replication, disk based data and a bunch of other things. anybody that is attending the mysql users conference may find this eerily familiar.

I can also talk about latest-and-totally-greatest developments and future stuff we’re working on. i can also take questions and constructive abuse 🙂

You can see a complete list of the MySQL tech talks at Google here.

Update: added Stewart’s talk now that it is online.