Drizzle Developer Day

I spent the day at the Drizzle Developer Day at Sun’s insane asylum campus. I’m not joking here, the campus was apparently a former insane asylum. First off I battled getting Drizzle to compile on Ubuntu 8.10, where the secret sauce appears to be to know about the drizzle-developer PPA. If you’re using Ubuntu 8.10, add this to your sources.list and life will be a bit better:

deb http://ppa.launchpad.net/drizzle-developers/ppa/ubuntu intrepid main
deb-src http://ppa.launchpad.net/drizzle-developers/ppa/ubuntu intrepid main

After that compiling drizzle was pretty easy.

Discovering the CASE statement

In an effort to speed up my database updates, I’ve been looking for ways to batch some of my updates. CASE seems like the way to go:

mysql> create table bar(a tinyint, b tinyint);
Query OK, 0 rows affected (0.02 sec)

mysql> insert into bar(a) values(1), (2), (3), (4), (5);
Query OK, 5 rows affected (0.00 sec)
Records: 5  Duplicates: 0  Warnings: 0

mysql> select * from bar;
+------+------+
| a    | b    |
+------+------+
|    1 | NULL |
|    2 | NULL |
|    3 | NULL |
|    4 | NULL |
|    5 | NULL |
+------+------+
5 rows in set (0.00 sec)

mysql> update bar set b = case a
    ->   when 1 then 42
    ->   when 2 then 43
    ->   when 3 then 44
    ->   else 45
    ->   end;
Query OK, 5 rows affected (0.00 sec)
Rows matched: 5  Changed: 5  Warnings: 0

mysql> select * from bar;
+------+------+
| a    | b    |
+------+------+
|    1 |   42 |
|    2 |   43 |
|    3 |   44 |
|    4 |   45 |
|    5 |   45 |
+------+------+
5 rows in set (0.00 sec)

I see stuff online which warns not to forget the else, otherwise you get a default of null, so I guess I should bear that caveat in mind…

Estimating the progress of queries on MySQL

I’ve been doing a lot of batch updates on one of my databases at home recently. show processlist says something like this:

mysql> show processlist;
+-------+------+---------------+--------------+---------+-------+----------+------------------------------------------+
| Id    | User | Host          | db           | Command | Time  | State    | Info                                     |
+-------+------+---------------+--------------+---------+-------+----------+------------------------------------------|
| 18354 | root | maui:37403    | smtp_servers | Query   | 57234 | Updating | update ips_218 set reverse_lookup = null |
| 22286 | root | maui:37348    | smtp_servers | Query   | 38103 | Updating | update ips_80 set reverse_lookup = null, |
| 22851 | root | maui:54982    | smtp_servers | Query   | 34091 | Updating | update ips_19 set reverse_lookup = null, |
| 23351 | root | molokai:58232 | smtp_servers | Sleep   |    57 |          | NULL                                     |
| 23496 | root | maui:40923    | smtp_servers | Query   | 29973 | Updating | update ips_62 set reverse_lookup = null, |
| 23906 | root | maui:38068    | smtp_servers | Query   | 26794 | Updating | update ips_83 set reverse_lookup = null, |
| 25675 | root | maui:56438    | smtp_servers | Query   | 12505 | Updating | update ips_82 set reverse_lookup = null, |
| 25846 | root | maui:41334    | smtp_servers | Query   | 10948 | Updating | update ips_90 set reverse_lookup = null, |
| 26437 | root | maui:41139    | smtp_servers | Query   |  6211 | Updating | update ips_66 set reverse_lookup = null, |
| 26773 | root | maui:32885    | smtp_servers | Query   |  3526 | Updating | update ips_76 set reverse_lookup = null, |
| 27073 | root | maui:42607    | smtp_servers | Query   |  1148 | Updating | update ips_11 set reverse_lookup = null, |
| 27202 | root | molokai:50688 | smtp_servers | Query   |     0 | NULL     | show processlist                         |
| 27203 | root | molokai:50689 | smtp_servers | Sleep   |     2 |          | NULL                                     |
+-------+------+---------------+--------------+---------+-------+----------+------------------------------------------+
14 rows in set (0.20 sec)

Now, wouldn’t it be nice if MySQL provided some extra information about the progress of those queries? Like for example the number of rows which have been updated so far, or an estimate of how long the query has left to run? I’m ok with such queries not being very accurate, but I assume the storage engine has to have some idea of how many rows are in the table and how many it has touched already.

Perhaps something like this already exists and I haven’t noticed? I’m using innodb if that matters.

Update: it seems like innodb can answer this question for me:

mysql> show engine innodb status \G;
...
---TRANSACTION 0 40056, ACTIVE 39794 sec, process no 22984, OS thread id 3020733328 waiting in InnoDB queue
mysql tables in use 1, locked 1
6672 lock struct(s), heap size 748864, undo log entries 909825
MySQL thread id 22851, query id 351217 maui 192.168.1.93 root Updating
update ips_19 set reverse_lookup = null, reverse = null, reverse_extracted
...

That doesn’t give you an estimate of percentage complete though. I assume there is a 1:1 correlation between undo row entries and rows altered by the query?

Update: my imperical observation is that the undo rows are not 100% correlated to the number of rows your query changed. Its correlated to the number of rows that were changed kinda near your query. For example, if you’re doing an update, then the number is good enough to trust. However, if you’re doing a select, then the number seems to be the number of rows someone else changed while your select was running (i.e. how many old versions needed to be kept around because of your select transaction).

Also, Jeremy Cole to the rescue!

Is there any way to access the match text in MySQL rlike selects?

Hi. I am doing a select like this in MySQL 5:

    select * from foo where bar rlike '(.*),(.*)';
    

The specific example here is made up. Anyway, I’d like to be able to get to the matched text from bar, like I can with various languages regexp libraries. Is this functionality exposed at all in MySQL? I’ve looked at the docs and can’t see any indication that it is, so this might just be wishful thinking.

MySQL Users Conference

Well, they’re definitely thinking about getting started. Like last year I caught the VTA down — it’s hard to beat a $1.75 trip without having to worry about traffic. Registraton wasn’t as smooth this year as last, for example I didn’t get my free book (there didn’t seem to be any attempt to hand those out to speakers). Whatever.

I’m now waiting for the replication talk to start.

Managing MySQL the Slack Way: How Google Deploys New MySQL Servers

I’ll be presenting about Slack (the open sourced tool kit we use for deployment software configuration) at the MySQL user’s conference in Santa Clara in late April. The talk will focus on the interesting aspects of Slack as it relates to MySQL and should be fun. A DBA mate of mine is gonna present with me, so it should be a barrel of laughs.

MySQL scaling: query snipers

I’ve been wanting a query sniper for a while for MySQL (some people seem to call them “query killers”). They’re basically programs which inspect the MySQL “show processlist” command and decide to kill some processes if they violate specified rules. I have a prototype of one in python, and couldn’t find any other implementations until tonight when I came across Querybane, which is what Wikipedia use to do exactly the same thing.

Now, I need to think further about if querybane does what I want, especially given I can’t find the source code. Perhaps it’s hidden in the media wiki code somewhere? The only code for a project named “servmon” (it’s parent project) that I can find is four years old and in PHP.

(Oh, by the way, the Wikipedia MySQL docs are really interesting, for example this page has some nice practical advice about replication log corruption. I wish I had found this before now for two reasons: it’s nice to see how other people do this stuff; and there is some stuff I can totally steal from here.)

So, does anyone know where I can find recent querybane code? Alternatively, what do other people use as a query sniper? Is it worth open sourcing mine?

Update: Apparently what I needed to do was sleep on it, and use Google Code Search. The querybane code appears to be at http://svn.wikimedia.org/svnroot/mediawiki/trunk/servmon/. In the end it’s not going to work for me, as it seems to be really based around a world in which you use IRC to keep a track of the state of your servers, which isn’t something I want to do.

They all use MySQL…

I was walking down Mountain View’s Castro Street this afternoon, and noticed that meebo is advertising for developers and system admins. Interestingly, they seem to match the design pattern used by pretty much every web 2.0 company I have seen around here (except MySpace): linux, MySQL, and Ajax. So, there you go.