I don’t blog about every Shaken Fist release here, but I do feel like the 0.4 release (and the subsequent minor bug fix release 0.4.1) are a pretty big deal in the life of the project.
The focus of the v0.4 series is reliability — we’ve used behaviour in the continuous integration pipeline as a proxy for that, but it should be a significant improvement in the real world as well. This has included:
much more extensive continuous integration coverage, including several new jobs.
checksumming image downloads, and retrying images where the checksum fails.
etcd reliability improvements.
refactoring instances and networks to a new “non-volatile” object model where only immutable values are cached.
images now track a state much like instances and networks.
a reworked state model for instances, where its clearer why an instance ended up in an error state. This is documented in our developer docs.
In terms of new features, we also added:
a network ping API, which will emit ICMP ping packets on the network node onto your virtual network. We use this in testing to ensure instances booted and ended up online.
networks are now checked to ensure that they have a reasonable minimum size.
addition of a simple etcd backup and restore tool (sf-backup).
improved data upgrade of previous installations.
VXLAN ids are now randomized, and this has forced a new naming scheme for network interfaces and bridges.
we are smarter about what networks we restore on startup, and don’t restore dead networks.
We also now require python 3.8.
Overall, Shaken Fist v0.4 is a place that makes me much more comfortable to run workloads I care about on that previous releases. Its far from perfect, but we’re definitely moving in the right direction.
This proposal was submitted for FOSDEM 2021. Given that acceptances were meant to be sent out on 25 December and its basically a week later I think we can assume that its been rejected. I’ve recently been writing up my rejected proposals, partially because I’ve put in the effort to write them and they might be useful elsewhere, but also because I think its important to demonstrate that its not unusual for experienced speakers to be rejected from these events.
OpenStack today is a complicated beast — not only does it try to perform well for large clusters, but it also embraces a diverse set of possible implementations from hypervisors, storage, networking, and more. This was a deliberate tactical choice made by the OpenStack community years ago, forming a so called “Big Tent” for vendors to collaborate in to build Open Source cloud options. It made a lot of sense at the time to be honest. However, OpenStack today finds itself constrained by the large number of permutations it must support, ten years of software and backwards compatability legacy, and a decreasing investment from those same vendors that OpenStack courted so actively.
Shaken Fist makes a series of simplifying assumptions that allow it to achieve a surprisingly large amount in not a lot of code. For example, it supports only one hypervisor, one hypervisor OS, one networking implementation, and lacks an image service. It tries hard to be respectful of compute resources while idle, and as fast as possible to deploy resources when requested — its entirely possible to deploy a new VM and start it booting in less than a second for example (if the boot image is already held in cache). Shaken Fist is likely a good choice for small deployments such as home labs and telco edge applications. It is unlikely to be a good choice for large scale compute however.
The other day we released Shaken Fist version 0.2, and I never got around to announcing it here. In fact, we’ve done a minor release since then and have another minor release in the wings ready to go out in the next day or so.
So what’s changed in Shaken Fist between version 0.1 and 0.2? Well, actually kind of a lot…
We moved from MySQL to etcd for storage of persistant state. This was partially done because we wanted distributed locking, but it was also because MySQL was a pain to work with.
Some work has gone into making the API service more production grade, although there is still some work to be done there probably in the 0.3 release — specifically there is a timeout if a response takes more than 300 seconds, which can be the case in launch large VMs where the disk images are not in cache.
There were also some important features added:
Authentication of API requests.
Namespaces (a bit like Kubernetes namespaces or OpenStack projects).
Resource tagging, called metadata.
Support for local mirroring of common disk images.
…and a large number of bug fixes.
Shaken Fist is also now packaged on pypi, and the deployment tooling knows how to install from packages as well as source if that’s a thing you’re interested in. You can read more at shakenfist.com, but that site is a bit of a work in progress at the moment. The new github organisation is at github.com/shakenfist.
A lot has happened in that time — OpenStack has gone from being two separate Open Source projects to a whole ecosystem, developers have come and gone (and passed away), and OpenStack has weathered the cloud wars of the last decade. OpenStack survived its early growth phase by deliberately offering a “big tent” to the community and associated vendors, with an expansive definition of what should be included. This has resulted in most developers being associated with a corporate sponser, and hence the decrease in the number of developers today as corporate interest wanes — OpenStack has never been great at attracting or retaining hobbist contributors.
My personal involvement with OpenStack started in November 2011, so while I missed the very early days I was around for a lot and made many of the mistakes that I now see in OpenStack.
What do I see as mistakes in OpenStack in hindsight? Well, embracing vendors who later lose interest has been painful, and has increased the complexity of the code base significantly. Nova itself is now nearly 400,000 lines of code, and that’s after splitting off many of the original features of Nova such as block storage and networking. Additionally, a lot of our initial assumptions are no longer true — for example in many cases we had to write code to implement things, where there are now good libraries available from third parties.
That’s not to say that OpenStack is without value — I am a daily user of OpenStack to this day, and use at least three OpenStack public clouds at the moment. That said, OpenStack is a complicated beast with a lot of legacy that makes it hard to maintain and slow to change.
For at least six months I’ve felt the desire for a simpler cloud orchestration layer — both for my own personal uses, and also as a test bed for ideas for what a smaller, simpler cloud might look like. My personal use case involves a relatively small environment which echos what we now think of as edge compute — less than 10 RU of machines with a minimum of orchestration and management overhead.
At the time that I was thinking about these things, the Australian bushfires and COVID-19 came along, and presented me with a lot more spare time than I had expected to have. While I’m still blessed to be employed, all of my social activities have been cancelled, so I find myself at home at a loose end on weekends and evenings at lot more than before.
Thus Shaken Fist was born — named for a Simpson’s meme, Shaken Fist is a deliberately small and highly opinionated cloud implementation aimed at working well in small deployments such as homes, labs, edge compute locations, deployed systems, and so forth.
I’d taken a bit of trouble with each feature in Shaken Fist to think through what the simplest and highest value way of doing something is. For example, instances always get a config drive and there is no metadata server. There is also only one supported type of virtual networking, and one supported hypervisor. That said, this means Shaken Fist is less than 5,000 lines of code, and small enough that new things can be implemented very quickly by a single middle aged developer.
Shaken Fist definitely has feature gaps — API authentication and scheduling are the most obvious at the moment — but I have plans to fill those when the time comes.
I’m not sure if Shaken Fist is useful to others, but you never know. Its apache2 licensed, and available on github if you’re interested.