I really wanted to like etcd, but Andy Pavlo was right

Andy Pavlo of the CMU Database Group is well known for saying that while NoSQL databases acquire cyclical popularity, all databases eventually iterate back to a SQL interface — it happened with MongoDB and Google’s BigTable for example.

I think I have hit that point with etcd. Initially I ported from MySQL to etcd because I really wanted the inexpensive distributed locking and being able to watch values. However, I never actually watch values in my code any more, and I now spend a huge amount of my time maintaining what my code calls “caches”, but which I can now see are just poorly implemented secondary indexes. The straw that broke the camel’s back was https://github.com/etcd-io/etcd/issues/9043, which changed etcd’s defaults to only being able to return 1.5mb in a RPC request.

I therefore think it might be time for me to port back to a real SQL database, perhaps keeping etcd to manage distributed locks. Perhaps.

I need to think about this more to be honest, but I think I’ve hit the limit of what you can express in key / value pairs directly stored in etcd. I often want to look up items based off of a portion of their value (the values are JSON), but that’s not possible in etcd without maintaining those extra indices that I now maintain. As I’ve grown as a programmer, I now really really want the Chubby-style check-and-replace transactional multi-table update syntax that etcd offers and S3 recently introduced as well. So moving back to a pure SQL database would leave me missing that.

One alternative to ditching etcd entirely would be to write a RPC service which sat in front of it and abstracted away the underlying data store. If I treated etcd as a storage engine, and then maintained the various indices in that abstracting layer, then I might get to a happier place. This would map to how modern databases are build somewhat if we thought of the keys in etcd as page locations in a storage engine. etcd would be a quite expensive storage engine however given it’s in-memory only attributes.

Oh, and you should all go and watch Andy Pavlo’s excellent lectures on how to build a database storage engine:

Leave a Reply

This site uses Akismet to reduce spam. Learn how your comment data is processed.