Shaken Fist v0.4.2

Shaken Fist v0.4.2 snuck out yesterday as part of shooting this tutorial video. That’s because I really wanted to demonstrate floating IPs, which I only recently got working nicely. Overall in v0.4.2 we:

  • Improved CI for image API calls.
  • Improved upgrade CI testing.
  • Improved network state tracking.
  • Floating IPs now work, and have covering CI. shakenfist#257
  • Resolve leaks of floating IPs from both direct use and NAT gateways. shakenfist#256
  • Resolve leaks of IPManagers on network delete. shakenfist#675
  • Use system packages for ansible during install.

Starting your first instance on Shaken Fist (a video tutorial)

As a bit of an experiment, I’ve made this quick and dirty “vlog” style tutorial video to show you how to install Shaken Fist on a single machine and boot your first instance. I demonstrate how to install, setup your first virtual network, start the instance, inspect events that the instance has experienced, and then log in.

Let me know if you think its useful.

Books read in January 2021

Its been 10 years since I’ve read enough to write one of these summary posts… Which I guess means something. This month I’ve been thinking a lot about systems design and how to avoid Second Systems effect while growing a product, which guided my reading choices a fair bit. A fair bit of that reading has been in the form of blog posts and twitter threads, so I am going to start including those in these listings of things I’ve read.

Social media posts of note:

Books:

Shaken Fist 0.4.1

I don’t blog about every Shaken Fist release here, but I do feel like the 0.4 release (and the subsequent minor bug fix release 0.4.1) are a pretty big deal in the life of the project.

Shaken Fist logo
We also got a cool logo during the v0.4 cycle as well.

The focus of the v0.4 series is reliability — we’ve used behaviour in the continuous integration pipeline as a proxy for that, but it should be a significant improvement in the real world as well. This has included:

  • much more extensive continuous integration coverage, including several new jobs.
  • checksumming image downloads, and retrying images where the checksum fails.
  • reworked locking.
  • etcd reliability improvements.
  • refactoring instances and networks to a new “non-volatile” object model where only immutable values are cached.
  • images now track a state much like instances and networks.
  • a reworked state model for instances, where its clearer why an instance ended up in an error state. This is documented in our developer docs.

In terms of new features, we also added:

  • a network ping API, which will emit ICMP ping packets on the network node onto your virtual network. We use this in testing to ensure instances booted and ended up online.
  • networks are now checked to ensure that they have a reasonable minimum size.
  • addition of a simple etcd backup and restore tool (sf-backup).
  • improved data upgrade of previous installations.
  • VXLAN ids are now randomized, and this has forced a new naming scheme for network interfaces and bridges.
  • we are smarter about what networks we restore on startup, and don’t restore dead networks.

We also now require python 3.8.

Overall, Shaken Fist v0.4 is a place that makes me much more comfortable to run workloads I care about on that previous releases. Its far from perfect, but we’re definitely moving in the right direction.

Goals Gone Wild

In 2009 Harvard Business School published a draft paper entitled “Goals Gone Wild“, and its abstract is quite concerning. For example:

“We identify specific side effects associated with goal setting, including a narrow focus that neglects non-goal areas, a rise in unethical behavior, distorted risk preferences, corrosion of organizational culture, and reduced intrinsic motivation.”

Are we doomed? Is all goal setting harmful? Interestingly, I came across this paper while reading Measure What Matters, which argues the exact opposite point — that OKRs provide a meaningful way to improve the productivity of an organization.

The paper starts by listing a series of examples of goal setting gone wrong: Sears’ auto repair in the early 1900s over charging customers to meet hourly billable goals; Enron’s sales targets based solely on volume and revenue and not profit; and Ford Motor Company’s goal of shipping a car at a specific target price point which resulted in significant safety failures.

The paper then provides specific examples of how goals can go wrong:

  • By being too specific and causing other important features of a task to be ignored — for example shipping on a specific deadline but ignoring testing adequately to achieve that deadline.
  • By being too common — employees with more than one goal tend to focus on one and ignore the others. For example studies have shown that if you present someone with both quality and quantity goals, that they will fixate on the quantity goals over the quality ones.
  • Inappropriate time horizon — for example, producing quarterly results by canibalizing longer term outcomes. Additionally, goals can be percieved as ceilings not floors, that is once a goal has been met attention is diverted elsewhere instead of over delivering on the goal.
  • By encouraging inappropriate risk taking or unethical behaviour — if a goal is too challenging, then an employee is encouraged to take risks they would not normally be able to justify in order to meet the goal.
  • Stretch goals that are not met hard employee’s confidence in their abilities and impact future performance.
  • A narrowly focused performance goal discourages learning and collaboration with coworkers. These tasks detract from time spent on the narrowly defined target, and are therefore de-emphasised.

The paper also calls out that while most people can see some amount of intrinsic motivation in their own behaviours, goals are extrinsic motivation and can be overused when applied to an intrinsicly motivated workforce.

Overall, the paper urges managers to consider if they goals they are setting are nessesary, and notes that goals should only be used in narrow circumstances.

A super simple sourdough loaf

This is the fourth in a series of posts documenting my adventures in making bread during the COVID-19 shutdown.

This post has been a while coming, but my sister in law was interested in the sourdough loaf last night, so I figured I should finally document my process. First off you need to have a sourdough starter, which I wrote up in a previous post. I am sure less cheaty ways will work too, but the cheating was where it was at for me.

Then, you basically follow the process I use for my super simple non-breadmaker loaf, but tweaked a little to use the starter. For the loaf itself:

  • 2 cups of bakers flour (not plain white flour)
  • 1 tea spoon of salt
  • 2 cups of the sourdough starter
  • 1 cup water

Similarly to the super simple loaf, you want the dough to be a bit tacky when mixed — it gets runnier as the yeast does its thing, so it will be too runny if it doesn’t start out tacky.

I then just leave it on the kitchen bench under a cover for the day. In the evening its baked like the super simple loaf — heat a high thermal mass dutch oven for 30 minutes at 230 degrees celcius, and then bake the break in the dutch over for first 30 minutes with the lid on, and then 12 more minutes with the lid off.

You also need to feed the starter when you make the loaf dough. That’s just 1.5 cups of flour, and a cup of warm water mixed into the starter after you’ve taken out the starter for the loaf. I tweak the flour to water ratio to keep the starter at a fairly thick consistency, and you’ll learn over time what is right. You basically want pancake batter consistency.

We keep our starter in the fridge and need to feed it (which means baking) twice a week. If we kept it on the bench we’d need to bake daily.

The Mythical Man-Month

I expect everyone (well, almost everyone) involved in some way in software engineering has heard of this book. I decided that it was time to finally read it, largely prompted by this excellent blog post by apenwarr which discusses second systems effect among other things. Now, you can buy this book for a surprisingly large amount of money, but as Michael Carden pointed out, the PDF is also made available for free by the Internet Archive. I’d recommend going that route.

The book is composed of a series of essays, which discuss the trials of the OS/360 team in the mid-1960s, and uses those experiences to attempt to form a series of more general observations on the art of software development and systems engineering.

Continue reading “The Mythical Man-Month”

The Mythical Man-month Book Cover The Mythical Man-month
Frederick Phillips Brooks, Frederick P. Brooks, Jr.,
Computer programming
Reading, Mass. ; Don Mills, Ont. : Addison-Wesley Publishing Company
1975
195

Deciding when to filter out large scale refactorings from code analysis

I want to be able to see the level of change between OpenStack releases. However, there are a relatively small number of changes with simply huge amounts of delta in them — they’re generally large refactors or the delete which happens when part of a repository is spun out into its own project.

I therefore wanted to explore what was a reasonable size for a change in OpenStack so that I could decide what maximum size to filter away as likely to be a refactor. After playing with a couple of approaches, including just randomly picking a number, it seems the logical way to decide is to simply plot a histogram of the various sizes, and then pick a reasonable place on the curve as the cutoff. Due to the large range of values (from zero lines of change to over a million!), I ended up deciding a logarithmic axis was the way to go.

For the projects listed in the OpenStack compute starter kit reference set, that produces the following histogram:A histogram of the sizes of various OpenStack commitsI feel that filtering out commits over 10,000 lines of delta feels justified based on that graph. For reference, the raw histogram buckets are:

Commit sizeCount
< 225747
< 11237436
< 101326314
< 1001148865
< 1000116928
< 1000013277
< 1000001522
< 1000000113

A quick summary of OpenStack release tags

I wanted a quick summary of OpenStack git release tags for a talk I am working on, and it turned out to be way more complicated than I expected. I ended up having to compile a table, and then turn that into a code snippet. In case its useful to anyone else, here it is:

ReleaseRelease dateFinal release tag
AustinOctober 20102010.1
BexarFebruary 20112011.1
CactusApril 20112011.2
DiabloSeptember 20112011.3
EssexApril 20122012.1.3
FolsomSeptember 20122012.2.4
GrizzlyApril 20132013.1.5
HavanaOctober 20132013.2.4
IcehouseApril 20142014.1.5
JunoOctober 20142014.2.4
KiloApril 20152015.1.4
LibertyOctober 2015Glance: 11.0.2
Keystone: 8.1.2
Neutron: 7.2.0
Nova: 12.0.6
MitakaApril 2016Glance: 12.0.0
Keystone: 9.3.0
Neutron: 8.4.0
Nova: 13.1.4
NewtonOctober 2016Glance: 13.0.0
Keystone: 10.0.3
Neutron: 9.4.1
Nova: 14.1.0
OcataFebruary 2017Glance: 14.0.1
Keystone: 11.0.4
Neutron: 10.0.7
Nova: 15.1.5
PikeAugust 2017Glance: 15.0.2
Keystone: 12.0.3
Neutron: 11.0.8
Nova: 16.1.8
QueensFebruary 2018Glance: 16.0.1
Keystone: 13.0.4
Neutron: 12.1.1
Nova: 17.0.13
RockyAugust 2018Glance: 17.0.1
Keystone: 14.2.0
Neutron: 13.0.7
Nova: 18.3.0
SteinApril 2019Glance: 18.0.1
Keystone: 15.0.1
Neutron: 14.4.2
Nova: 19.3.2
TrainOctober 2019Glance: 19.0.4
Keystone: 16.0.1
Neutron: 15.3.0
Nova: 20.4.1
UssuriMay 2020Glance: 20.0.1
Keystone: 17.0.0
Neutron: 16.2.0
Nova: 21.1.1
VictoriaOctober 2020Glance: 21.0.0
Keystone: 18.0.0
Neutron: 17.0.0
Nova: 22.0.1

Or in python form for those so inclined:

RELEASE_TAGS = {
    'austin': {'all': '2010.1'},
    'bexar': {'all': '2011.1'},
    'cactus': {'all': '2011.2'},
    'diablo': {'all': '2011.3'},
    'essex': {'all': '2012.1.3'},
    'folsom': {'all': '2012.2.4'},
    'grizzly': {'all': '2013.1.5'},
    'havana': {'all': '2013.2.4'},
    'icehouse': {'all': '2014.1.5'},
    'juno': {'all': '2014.2.4'},
    'kilo': {'all': '2015.1.4'},
    'liberty': {
        'glance': '11.0.2',
        'keystone': '8.1.2',
        'neutron': '7.2.0',
        'nova': '12.0.6'
    },
    'mitaka': {
        'glance': '12.0.0',
        'keystone': '9.3.0',
        'neutron': '8.4.0',
        'nova': '13.1.4'
    },
    'newton': {
        'glance': '13.0.0',
        'keystone': '10.0.3',
        'neutron': '9.4.1',
        'nova': '14.1.0'
    },
    'ocata': {
        'glance': '14.0.1',
        'keystone': '11.0.4',
        'neutron': '10.0.7',
        'nova': '15.1.5'
    },
    'pike': {
        'glance': '15.0.2',
        'keystone': '12.0.3',
        'neutron': '11.0.8',
        'nova': '16.1.8'
    },
    'queens': {
        'glance': '16.0.1',
        'keystone': '13.0.4',
        'neutron': '12.1.1',
        'nova': '17.0.13'
    },
    'rocky': {
        'glance': '17.0.1',
        'keystone': '14.2.0',
        'neutron': '13.0.7',
        'nova': '18.3.0'
    },
    'stein': {
        'glance': '18.0.1',
        'keystone': '15.0.1',
        'neutron': '14.4.2',
        'nova': '19.3.2'
    },
    'train': {
        'glance': '19.0.4',
        'keystone': '16.0.1',
        'neutron': '15.3.0',
        'nova': '20.4.1'
    },
    'ussuri': {
        'glance': '20.0.1',
        'keystone': '17.0.0',
        'neutron': '16.2.0',
        'nova': '21.1.1'
    },
    'victoria': {
        'glance': '21.0.0',
        'keystone': '18.0.0',
        'neutron': '17.0.0',
        'nova': '22.0.1'
    }
}

Rejected talk proposal: Shaken Fist, thought experiments in simpler IaaS clouds

This proposal was submitted for FOSDEM 2021. Given that acceptances were meant to be sent out on 25 December and its basically a week later I think we can assume that its been rejected. I’ve recently been writing up my rejected proposals, partially because I’ve put in the effort to write them and they might be useful elsewhere, but also because I think its important to demonstrate that its not unusual for experienced speakers to be rejected from these events.


OpenStack today is a complicated beast — not only does it try to perform well for large clusters, but it also embraces a diverse set of possible implementations from hypervisors, storage, networking, and more. This was a deliberate tactical choice made by the OpenStack community years ago, forming a so called “Big Tent” for vendors to collaborate in to build Open Source cloud options. It made a lot of sense at the time to be honest. However, OpenStack today finds itself constrained by the large number of permutations it must support, ten years of software and backwards compatability legacy, and a decreasing investment from those same vendors that OpenStack courted so actively.

Shaken Fist makes a series of simplifying assumptions that allow it to achieve a surprisingly large amount in not a lot of code. For example, it supports only one hypervisor, one hypervisor OS, one networking implementation, and lacks an image service. It tries hard to be respectful of compute resources while idle, and as fast as possible to deploy resources when requested — its entirely possible to deploy a new VM and start it booting in less than a second for example (if the boot image is already held in cache). Shaken Fist is likely a good choice for small deployments such as home labs and telco edge applications. It is unlikely to be a good choice for large scale compute however.