Deciding when to filter out large scale refactorings from code analysis

I want to be able to see the level of change between OpenStack releases. However, there are a relatively small number of changes with simply huge amounts of delta in them — they’re generally large refactors or the delete which happens when part of a repository is spun out into its own project.

I therefore wanted to explore what was a reasonable size for a change in OpenStack so that I could decide what maximum size to filter away as likely to be a refactor. After playing with a couple of approaches, including just randomly picking a number, it seems the logical way to decide is to simply plot a histogram of the various sizes, and then pick a reasonable place on the curve as the cutoff. Due to the large range of values (from zero lines of change to over a million!), I ended up deciding a logarithmic axis was the way to go.

For the projects listed in the OpenStack compute starter kit reference set, that produces the following histogram:A histogram of the sizes of various OpenStack commitsI feel that filtering out commits over 10,000 lines of delta feels justified based on that graph. For reference, the raw histogram buckets are:

Commit sizeCount
< 225747
< 11237436
< 101326314
< 1001148865
< 1000116928
< 1000013277
< 1000001522
< 1000000113

A quick summary of OpenStack release tags

I wanted a quick summary of OpenStack git release tags for a talk I am working on, and it turned out to be way more complicated than I expected. I ended up having to compile a table, and then turn that into a code snippet. In case its useful to anyone else, here it is:

ReleaseRelease dateFinal release tag
AustinOctober 20102010.1
BexarFebruary 20112011.1
CactusApril 20112011.2
DiabloSeptember 20112011.3
EssexApril 20122012.1.3
FolsomSeptember 20122012.2.4
GrizzlyApril 20132013.1.5
HavanaOctober 20132013.2.4
IcehouseApril 20142014.1.5
JunoOctober 20142014.2.4
KiloApril 20152015.1.4
LibertyOctober 2015Glance: 11.0.2
Keystone: 8.1.2
Neutron: 7.2.0
Nova: 12.0.6
MitakaApril 2016Glance: 12.0.0
Keystone: 9.3.0
Neutron: 8.4.0
Nova: 13.1.4
NewtonOctober 2016Glance: 13.0.0
Keystone: 10.0.3
Neutron: 9.4.1
Nova: 14.1.0
OcataFebruary 2017Glance: 14.0.1
Keystone: 11.0.4
Neutron: 10.0.7
Nova: 15.1.5
PikeAugust 2017Glance: 15.0.2
Keystone: 12.0.3
Neutron: 11.0.8
Nova: 16.1.8
QueensFebruary 2018Glance: 16.0.1
Keystone: 13.0.4
Neutron: 12.1.1
Nova: 17.0.13
RockyAugust 2018Glance: 17.0.1
Keystone: 14.2.0
Neutron: 13.0.7
Nova: 18.3.0
SteinApril 2019Glance: 18.0.1
Keystone: 15.0.1
Neutron: 14.4.2
Nova: 19.3.2
TrainOctober 2019Glance: 19.0.4
Keystone: 16.0.1
Neutron: 15.3.0
Nova: 20.4.1
UssuriMay 2020Glance: 20.0.1
Keystone: 17.0.0
Neutron: 16.2.0
Nova: 21.1.1
VictoriaOctober 2020Glance: 21.0.0
Keystone: 18.0.0
Neutron: 17.0.0
Nova: 22.0.1

Or in python form for those so inclined:

RELEASE_TAGS = {
    'austin': {'all': '2010.1'},
    'bexar': {'all': '2011.1'},
    'cactus': {'all': '2011.2'},
    'diablo': {'all': '2011.3'},
    'essex': {'all': '2012.1.3'},
    'folsom': {'all': '2012.2.4'},
    'grizzly': {'all': '2013.1.5'},
    'havana': {'all': '2013.2.4'},
    'icehouse': {'all': '2014.1.5'},
    'juno': {'all': '2014.2.4'},
    'kilo': {'all': '2015.1.4'},
    'liberty': {
        'glance': '11.0.2',
        'keystone': '8.1.2',
        'neutron': '7.2.0',
        'nova': '12.0.6'
    },
    'mitaka': {
        'glance': '12.0.0',
        'keystone': '9.3.0',
        'neutron': '8.4.0',
        'nova': '13.1.4'
    },
    'newton': {
        'glance': '13.0.0',
        'keystone': '10.0.3',
        'neutron': '9.4.1',
        'nova': '14.1.0'
    },
    'ocata': {
        'glance': '14.0.1',
        'keystone': '11.0.4',
        'neutron': '10.0.7',
        'nova': '15.1.5'
    },
    'pike': {
        'glance': '15.0.2',
        'keystone': '12.0.3',
        'neutron': '11.0.8',
        'nova': '16.1.8'
    },
    'queens': {
        'glance': '16.0.1',
        'keystone': '13.0.4',
        'neutron': '12.1.1',
        'nova': '17.0.13'
    },
    'rocky': {
        'glance': '17.0.1',
        'keystone': '14.2.0',
        'neutron': '13.0.7',
        'nova': '18.3.0'
    },
    'stein': {
        'glance': '18.0.1',
        'keystone': '15.0.1',
        'neutron': '14.4.2',
        'nova': '19.3.2'
    },
    'train': {
        'glance': '19.0.4',
        'keystone': '16.0.1',
        'neutron': '15.3.0',
        'nova': '20.4.1'
    },
    'ussuri': {
        'glance': '20.0.1',
        'keystone': '17.0.0',
        'neutron': '16.2.0',
        'nova': '21.1.1'
    },
    'victoria': {
        'glance': '21.0.0',
        'keystone': '18.0.0',
        'neutron': '17.0.0',
        'nova': '22.0.1'
    }
}

Rejected talk proposal: Shaken Fist, thought experiments in simpler IaaS clouds

This proposal was submitted for FOSDEM 2021. Given that acceptances were meant to be sent out on 25 December and its basically a week later I think we can assume that its been rejected. I’ve recently been writing up my rejected proposals, partially because I’ve put in the effort to write them and they might be useful elsewhere, but also because I think its important to demonstrate that its not unusual for experienced speakers to be rejected from these events.


OpenStack today is a complicated beast — not only does it try to perform well for large clusters, but it also embraces a diverse set of possible implementations from hypervisors, storage, networking, and more. This was a deliberate tactical choice made by the OpenStack community years ago, forming a so called “Big Tent” for vendors to collaborate in to build Open Source cloud options. It made a lot of sense at the time to be honest. However, OpenStack today finds itself constrained by the large number of permutations it must support, ten years of software and backwards compatability legacy, and a decreasing investment from those same vendors that OpenStack courted so actively.

Shaken Fist makes a series of simplifying assumptions that allow it to achieve a surprisingly large amount in not a lot of code. For example, it supports only one hypervisor, one hypervisor OS, one networking implementation, and lacks an image service. It tries hard to be respectful of compute resources while idle, and as fast as possible to deploy resources when requested — its entirely possible to deploy a new VM and start it booting in less than a second for example (if the boot image is already held in cache). Shaken Fist is likely a good choice for small deployments such as home labs and telco edge applications. It is unlikely to be a good choice for large scale compute however.

Introducing Shaken Fist

The first public commit to what would become OpenStack Nova was made ten years ago today — at Thu May 27 23:05:26 2010 PDT to be exact. So first off, happy tenth birthday to Nova!

A lot has happened in that time — OpenStack has gone from being two separate Open Source projects to a whole ecosystem, developers have come and gone (and passed away), and OpenStack has weathered the cloud wars of the last decade. OpenStack survived its early growth phase by deliberately offering a “big tent” to the community and associated vendors, with an expansive definition of what should be included. This has resulted in most developers being associated with a corporate sponser, and hence the decrease in the number of developers today as corporate interest wanes — OpenStack has never been great at attracting or retaining hobbist contributors.

My personal involvement with OpenStack started in November 2011, so while I missed the very early days I was around for a lot and made many of the mistakes that I now see in OpenStack.

What do I see as mistakes in OpenStack in hindsight? Well, embracing vendors who later lose interest has been painful, and has increased the complexity of the code base significantly. Nova itself is now nearly 400,000 lines of code, and that’s after splitting off many of the original features of Nova such as block storage and networking. Additionally, a lot of our initial assumptions are no longer true — for example in many cases we had to write code to implement things, where there are now good libraries available from third parties.

That’s not to say that OpenStack is without value — I am a daily user of OpenStack to this day, and use at least three OpenStack public clouds at the moment. That said, OpenStack is a complicated beast with a lot of legacy that makes it hard to maintain and slow to change.

For at least six months I’ve felt the desire for a simpler cloud orchestration layer — both for my own personal uses, and also as a test bed for ideas for what a smaller, simpler cloud might look like. My personal use case involves a relatively small environment which echos what we now think of as edge compute — less than 10 RU of machines with a minimum of orchestration and management overhead.

At the time that I was thinking about these things, the Australian bushfires and COVID-19 came along, and presented me with a lot more spare time than I had expected to have. While I’m still blessed to be employed, all of my social activities have been cancelled, so I find myself at home at a loose end on weekends and evenings at lot more than before.

Thus Shaken Fist was born — named for a Simpson’s meme, Shaken Fist is a deliberately small and highly opinionated cloud implementation aimed at working well in small deployments such as homes, labs, edge compute locations, deployed systems, and so forth.

I’d taken a bit of trouble with each feature in Shaken Fist to think through what the simplest and highest value way of doing something is. For example, instances always get a config drive and there is no metadata server. There is also only one supported type of virtual networking, and one supported hypervisor. That said, this means Shaken Fist is less than 5,000 lines of code, and small enough that new things can be implemented very quickly by a single middle aged developer.

Shaken Fist definitely has feature gaps — API authentication and scheduling are the most obvious at the moment — but I have plans to fill those when the time comes.

I’m not sure if Shaken Fist is useful to others, but you never know. Its apache2 licensed, and available on github if you’re interested.

What’s missing from the ONAP community — an open design process

I’ve been thinking a fair bit about ONAP and its future releases recently. This is in the context of trying to implement a system for a client which is based on ONAP. Its really hard though, because its hard to determine how various components of ONAP are intended to work, or interoperate.

It took me a while, but I’ve realised what’s missing here…

OpenStack has an open design process. If you want to add a new feature to Nova for example, the first step is you need to write down what the feature is intended to do, how it integrates with the rest of Nova, and how people might use it. The target audience for that document is both the Nova development team, but also people who operate OpenStack deployments.

ONAP has no equivalent that I can find. So for example, they say that in Casablanca they are going to implement a “AAI Enricher” to ease lookup of data from external systems in their inventory database, but I can’t find anywhere where they explain how the integration between arbitrary external systems and ONAP AAI will work.

I think ONAP would really benefit from a good hard look at their design processes and how approachable they are for people outside their development teams. The current use case proposal process (videos, conference talks, and powerpoint presentations) just isn’t great for people who are trying to figure out how to deploy their software.

Learning from the mistakes that even big projects make

The following is a blog post version of a talk presented at pyconau 2018. Slides for the presentation can be found here (as Microsoft powerpoint, or as PDF), and a video of the talk (thanks NextDayVideo!) is below:

 

OpenStack is an orchestration system for setting up virtual machines and associated other virtual resources such as networks and storage on clusters of computers. At a high level, OpenStack is just configuring existing facilities of the host operating system — there isn’t really a lot of difference between OpenStack and a room full of system admins frantically resolving tickets requesting virtual machines be setup. The only real difference is scale and predictability.

To do its job, OpenStack needs to be able to manipulate parts of the operating system which are normally reserved for administrative users. This talk is the story of how OpenStack has done that thing over time, what we learnt along the way, and what I’d do differently if I had my time again. Lots of systems need to do these things, so even if you never use OpenStack hopefully there are things to be learnt here.

Continue reading “Learning from the mistakes that even big projects make”

pyconau 2018 call for proposals now open

The pyconau call for proposals is now open, and runs until 28 May. I took my teenagers to pyconau last year and they greatly enjoyed it. I hadn’t been to a pyconau in ages, and ended up really enjoying thinking about things from topic areas I don’t normally need to think about. I think expanding one’s horizons is generally a good idea.

Should I propose something for this year? I am unsure. Some random ideas that immediately spring to mind:

  • something about privsep: I think a generalised way to make privileged calls in unprivileged code is quite interesting, especially in a language which is often used for systems management and integration tasks. That said, perhaps its too OpenStacky given how disinterested in OpenStack talks most python people seem to be.
  • nova-warts: for a long time my hobby has been cleaning up historical mistakes made in OpenStack Nova that wont ever rate as a major feature change. What lessons can other projects learn from a well funded and heavily staffed project that still thought that exec() was a great way to do important work? There’s definitely an overlap with the privsep talk above, but this would be more general.
  • a talk about how I had to manage some code which only worked in python2, and some other code that only worked in python3 and in the end gave up on venvs and decided that Docker containers are like the ultimate venvs. That said, I suspect this is old hat and was obvious to everyone except me.
  • something else I haven’t though of.

Anyways, I’m undecided. Comments welcome.

Also, here’s an image for this post. Its the stone henge we found at Guerilla Bay last weekend. I assume its in frequent use for tiny tiny druids.

On Selecting a Well Engaged Open Source Vendor

Aptira is in an interesting position in the Open Source market, because we don’t usually sell software. Instead, our customers come to us seeking assistance with deciding which OpenStack to use, or how to embed ONAP into their nationwide networks, or how to move their legacy networks to the software defined future. Therefore, our most common role is as a trusted advisor to help our customers decide which Open Source products to buy.

(My boss would insist that I point out here that we do customisation of Open Source for our customers, and have assisted many in the past with deploying pure upstream solutions. Basically, we do what is the right fit for the customer, and aren’t obsessed with fitting customers into pre-defined moulds that suit our partners.)

That makes it important that we recommend products from companies that are well engaged with their upstream Open Source communities. That might be OpenStack, or ONAP, or even something like Open Daylight. This raises the obvious question – what makes a company well engaged with an upstream project?

Read more over at my employer’s blog

Call for presentations for the linux.conf.au 2014 OpenStack mini-conference

I’ve just emailed this out to the relevant lists, but I figured it can’t hurt to post it here as well…

linux.conf.au will be hosting the second OpenStack mini-conference to
run in Australia. The first one was well attended, and this
mini-conference will be the first OpenStack conference to be held on
Australia’s west coast. The mini-conference is a day long event
focusing on OpenStack development and operations, and is available to
attendees of linux.conf.au.

The mini-conference is therefore calling for proposals for content.
Speakers at the mini-conference must be registered for linux.conf.au
2014 as delegates, or discuss their needs with the mini-conference
organizers if that isn’t possible.

Some examples of talks we’re interested in are: talks from OpenStack
developers about what features they are working on for IceHouse; talks
from deployers of OpenStack about their experiences and how others can
learn from them; talks covering the functionality of OpenStack and how
it can be used in new and interesting ways.

Some important details:

  • linux.conf.au runs from 6 to 10 January 2014 in Perth, Australia at
    the University of Western Australia

  • the mini-conference will be on Tuesday the 7th of January
  • proposals are due to the mini-conference organizer no later than 1 November
  • there are two types of talks — full length (45 minutes) and half
    length (20 minutes)

CFP submissions are made by completing this online form:
CFP submission form

If you have questions about this call for presentations, please
contact Michael Still at openstack-lca2014@lists.stillhq.com for more
details.