Deciding when to filter out large scale refactorings from code analysis

Share

I want to be able to see the level of change between OpenStack releases. However, there are a relatively small number of changes with simply huge amounts of delta in them — they’re generally large refactors or the delete which happens when part of a repository is spun out into its own project.

I therefore wanted to explore what was a reasonable size for a change in OpenStack so that I could decide what maximum size to filter away as likely to be a refactor. After playing with a couple of approaches, including just randomly picking a number, it seems the logical way to decide is to simply plot a histogram of the various sizes, and then pick a reasonable place on the curve as the cutoff. Due to the large range of values (from zero lines of change to over a million!), I ended up deciding a logarithmic axis was the way to go.

For the projects listed in the OpenStack compute starter kit reference set, that produces the following histogram:A histogram of the sizes of various OpenStack commitsI feel that filtering out commits over 10,000 lines of delta feels justified based on that graph. For reference, the raw histogram buckets are:

Commit sizeCount
< 225747
< 11237436
< 101326314
< 1001148865
< 1000116928
< 1000013277
< 1000001522
< 1000000113
Share

A quick summary of OpenStack release tags

Share

I wanted a quick summary of OpenStack git release tags for a talk I am working on, and it turned out to be way more complicated than I expected. I ended up having to compile a table, and then turn that into a code snippet. In case its useful to anyone else, here it is:

ReleaseRelease dateFinal release tag
AustinOctober 20102010.1
BexarFebruary 20112011.1
CactusApril 20112011.2
DiabloSeptember 20112011.3
EssexApril 20122012.1.3
FolsomSeptember 20122012.2.4
GrizzlyApril 20132013.1.5
HavanaOctober 20132013.2.4
IcehouseApril 20142014.1.5
JunoOctober 20142014.2.4
KiloApril 20152015.1.4
LibertyOctober 2015Glance: 11.0.2
Keystone: 8.1.2
Neutron: 7.2.0
Nova: 12.0.6
MitakaApril 2016Glance: 12.0.0
Keystone: 9.3.0
Neutron: 8.4.0
Nova: 13.1.4
NewtonOctober 2016Glance: 13.0.0
Keystone: 10.0.3
Neutron: 9.4.1
Nova: 14.1.0
OcataFebruary 2017Glance: 14.0.1
Keystone: 11.0.4
Neutron: 10.0.7
Nova: 15.1.5
PikeAugust 2017Glance: 15.0.2
Keystone: 12.0.3
Neutron: 11.0.8
Nova: 16.1.8
QueensFebruary 2018Glance: 16.0.1
Keystone: 13.0.4
Neutron: 12.1.1
Nova: 17.0.13
RockyAugust 2018Glance: 17.0.1
Keystone: 14.2.0
Neutron: 13.0.7
Nova: 18.3.0
SteinApril 2019Glance: 18.0.1
Keystone: 15.0.1
Neutron: 14.4.2
Nova: 19.3.2
TrainOctober 2019Glance: 19.0.4
Keystone: 16.0.1
Neutron: 15.3.0
Nova: 20.4.1
UssuriMay 2020Glance: 20.0.1
Keystone: 17.0.0
Neutron: 16.2.0
Nova: 21.1.1
VictoriaOctober 2020Glance: 21.0.0
Keystone: 18.0.0
Neutron: 17.0.0
Nova: 22.0.1

Or in python form for those so inclined:

RELEASE_TAGS = {
    'austin': {'all': '2010.1'},
    'bexar': {'all': '2011.1'},
    'cactus': {'all': '2011.2'},
    'diablo': {'all': '2011.3'},
    'essex': {'all': '2012.1.3'},
    'folsom': {'all': '2012.2.4'},
    'grizzly': {'all': '2013.1.5'},
    'havana': {'all': '2013.2.4'},
    'icehouse': {'all': '2014.1.5'},
    'juno': {'all': '2014.2.4'},
    'kilo': {'all': '2015.1.4'},
    'liberty': {
        'glance': '11.0.2',
        'keystone': '8.1.2',
        'neutron': '7.2.0',
        'nova': '12.0.6'
    },
    'mitaka': {
        'glance': '12.0.0',
        'keystone': '9.3.0',
        'neutron': '8.4.0',
        'nova': '13.1.4'
    },
    'newton': {
        'glance': '13.0.0',
        'keystone': '10.0.3',
        'neutron': '9.4.1',
        'nova': '14.1.0'
    },
    'ocata': {
        'glance': '14.0.1',
        'keystone': '11.0.4',
        'neutron': '10.0.7',
        'nova': '15.1.5'
    },
    'pike': {
        'glance': '15.0.2',
        'keystone': '12.0.3',
        'neutron': '11.0.8',
        'nova': '16.1.8'
    },
    'queens': {
        'glance': '16.0.1',
        'keystone': '13.0.4',
        'neutron': '12.1.1',
        'nova': '17.0.13'
    },
    'rocky': {
        'glance': '17.0.1',
        'keystone': '14.2.0',
        'neutron': '13.0.7',
        'nova': '18.3.0'
    },
    'stein': {
        'glance': '18.0.1',
        'keystone': '15.0.1',
        'neutron': '14.4.2',
        'nova': '19.3.2'
    },
    'train': {
        'glance': '19.0.4',
        'keystone': '16.0.1',
        'neutron': '15.3.0',
        'nova': '20.4.1'
    },
    'ussuri': {
        'glance': '20.0.1',
        'keystone': '17.0.0',
        'neutron': '16.2.0',
        'nova': '21.1.1'
    },
    'victoria': {
        'glance': '21.0.0',
        'keystone': '18.0.0',
        'neutron': '17.0.0',
        'nova': '22.0.1'
    }
}
Share

Writing a terraform remote state server

Share

Terraform is a useful tool for deploying cloud resources. This post isn’t an introduction to terraform, so I’ll assume you already know and love it. If you want more, then this getting started guide would be a sensible start.

At its most basic level, terraform deploys cloud resources and stores information about those resources in a file on local disk called terraform.tfstate — it needs that state information so it can make later changes to the deployment, be those modifying resources in use or tearing the whole deployment down. If you had an operations team working on an environment, then you could store the tfstate file in git or a shared filesystem so that the entire team could manage the deployment. However, there is nothing with that approach that stops two members of the team making overlapping changes.

That’s where terraform state servers come in. State servers can implement optional locking, which stops overlapping operations from happening. The protocol that these servers talk isn’t well documented (that I could find), so I wanted to explore that. I wanted to explore that more, so I wrote a simple terraform HTTP state server in python.

To use this state server, configure your terraform file as per demo.tf. The important bits are:

terraform {
  backend "http" {
    address = "http://localhost:5000/terraform_state/4cdd0c76-d78b-11e9-9bea-db9cd8374f3a"
    lock_address = "http://localhost:5000/terraform_lock/4cdd0c76-d78b-11e9-9bea-db9cd8374f3a"
    lock_method = "PUT"
    unlock_address = "http://localhost:5000/terraform_lock/4cdd0c76-d78b-11e9-9bea-db9cd8374f3a"
    unlock_method = "DELETE"
  }
}

Where the URL to the state server will obviously change. The UUID in the URL (4cdd0c76-d78b-11e9-9bea-db9cd8374f3a in this case) is an example of an external ID you might use to correlate the terraform state with the system that requested it be built. It doesn’t have to be a UUID, it can be any string.

I am using PUT and DELETE for locks due to limitations in the HTTP verbs that the python flask framework exposes. You might be able to get away with the defaults in other languages or frameworks.

To run the python server, make a venv, install the dependancies, and then run:

$ python3 -m venv ~/virtualenvs/remote_state
$ . ~/virtualenvs/remote_state/bin/activate
$ pip install -U -r requirements.txt
$ python stateserver.py

I hope someone else finds this useful.

Share

Playing with the python prometheus query API

Share

The last few days have been a bit icky around here, with my house apparently proudly residing in the major city with the dirtiest air in the world. So, I needed a distraction…

It has also been quite hot, so I wondered how my energy usage was going. I have prometheus monitoring of my power draw, so now seemed as good a time as any to learn how to do some historical querying over the API. I ended up with a python script which can output things like this: Yesterday had a maximum temperature of 38 and we used 28.36 kwh. The average for similar days is 25.56 kwh.”

The code is on github if it is of interest to others. I am sure I could push more of this processing down into the prometheus engine, but I couldn’t see how to do it today. Hints welcome!

Share

Using a MCP4921 or MCP4922 as a SPI DAC for Audio on Raspberry Pi

Share

I’ve been playing recently with using a MCP4921 as an audio DAC on a Raspberry Pi Zero W, although a MCP4922 would be equivalent (the ’22 is a two channel DAC, the ’21 is a single channel DAC). This post is my notes on where I got to before I decided that thing wasn’t going to work out for me.

My basic requirement was to be able to play sounds on a raspberry pi which already has two SPI buses in use. Thus, adding a SPI DAC seemed like a logical choice. The basic circuit looked like this:

MCP4921 SPI DAC circuit

Driving this circuit looked like this (noting that this code was a prototype and isn’t the best ever). The bit that took a while there was realising that the CS line needs to be toggled between 16 bit writes. Once that had been done (which meant moving to a different spidev call), things were on the up and up.

This was the point I realised that I was at a dead end. I can’t find a way to send the data to the DAC in a way which respects the timing of the audio file. Before I had to do small writes to get the CS line to toggle I could do things fast enough, but not afterwards. Perhaps there’s a DMA option instead, but I haven’t found one yet.

Instead, I think I’m going to go and try PWM based audio. If that doesn’t work, it will be a MAX219 i2c DAC for me!

Share

Introducing GangScan

Share

As some of you might know, I am a Scout Leader. One of the things I do for Scouts is I assist in a minor role with the running of Canberra Gang Show, a theatre production for young people.

One of the things Gang Show cares about is that they need to be able to do rapid roll calls and reporting on who is present at any given time — this is used for working out who is absent before a performance (and therefore needs an understudy), as well as ensuring we know where everyone is in an environment that sometimes has its fire suppression systems isolated.

Before I came along, Canberra Gang Show was doing this with a Windows based attendance tracking application, and 125kHz RFID tags. This system worked just fine, except that the software was clunky and there was only one badge reader — we struggled explaining to youth that they need to press the “out” button when logging out, and we wanted to be able to have attendance trackers at other locations in the theatre instead of forcing everyone to flow through a single door.

So, I got thinking. How hard can it be to build something a bit better?

Let’s start with some requirements: simple to deploy and manage; free software (both cost and freedom); more badge readers than what we have now; and low cost.

My basic proposal for such a thing is a Raspberry Pi Zero W, with a small LCD screen and a RFID reader. The device scans badges, and displays a confirmation of scan to the user. If the device can talk to a central server it streams events to it; otherwise it queues them until the server is available and then streams them.

Sourcing a simple SPI LCD screen and SPI RFID reader from ebay wasn’t too hard, and we were off! The only real wart was that I wanted to use 13.56mHz RFID cards, because then I could store some interesting (up to 1kb) data on the card itself. The first version was simply a ribbon cable:

v0.0, a ribbon cable

Which then led to me having my first PCB ever made. Let’s ignore that its the wrong size shall we?

v0.1, an incorrectly sized PCB

I’m now at the point where the software for the scanner is reasonable, and there is a bare bones server that does enough roll call that it should be functional. I am sure there’s more to be done, but it works enough to demo. One thing I learned while showing off the device at coffee the other day is that it really needs to make a noise when you scan a badge. I’ve ordered a SPI DAC to play with, which might be the solution there. Other next steps include a newer version of the PCB, and some sort of case solution. I’ll do another post when things progress further.

Oh yes, and I’ll eventually release the software too once its in a more workable state.

Share

HomeAssistant configuration

Share

I’ve recently been playing with HomeAssistant, which is quite cool. Its not perfect — for example it broke recently for me without any debug logs indicating problems because it didn’t want to terminate SSL any more, but its better than anything else I’ve seen so far.

Along the way its been super handy to be able to refer to other people’s HomeAssistant configurations to see how they got things working. So in that spirit, here’s my current configuration with all of the secrets pulled out. Its not the most complicated config, but it does do some things which took me a while to get working. Some examples:

  • The Roomba runs when no one is home, and let’s me know when its bin is full.
  • A custom component to track when events last occurred so that I can rate limit things like how often the Roomba runs when no one is home.
  • I detect when my wired doorbell goes off, and play a “ding dong” MP3 in the office yurt out the back so I know when someone is visiting.
  • …and probably other things.

I intend to write up interesting things as I think of them, but we’ll see how we go with that.

Share

Learning from the mistakes that even big projects make

Share

The following is a blog post version of a talk presented at pyconau 2018. Slides for the presentation can be found here (as Microsoft powerpoint, or as PDF), and a video of the talk (thanks NextDayVideo!) is below:

 

OpenStack is an orchestration system for setting up virtual machines and associated other virtual resources such as networks and storage on clusters of computers. At a high level, OpenStack is just configuring existing facilities of the host operating system — there isn’t really a lot of difference between OpenStack and a room full of system admins frantically resolving tickets requesting virtual machines be setup. The only real difference is scale and predictability.

To do its job, OpenStack needs to be able to manipulate parts of the operating system which are normally reserved for administrative users. This talk is the story of how OpenStack has done that thing over time, what we learnt along the way, and what I’d do differently if I had my time again. Lots of systems need to do these things, so even if you never use OpenStack hopefully there are things to be learnt here.

Continue reading “Learning from the mistakes that even big projects make”

Share

Rejected talk proposal: Design at scale: OpenStack versus Kubernetes

Share

This proposal was submitted for pyconau 2018. It wasn’t accepted, but given I’d put the effort into writing up the proposal I’ll post it here in case its useful some other time. The oblique references to OpensStack are because pycon had an “anonymous” review system in 2018, and I was avoiding saying things which directly identified me as the author.


OpenStack and Kubernetes solve very similar problems. Yet they approach those problems in very different ways. What can we learn from the different approaches taken? The differences aren’t just technical though, there are some interesting social differences too. Continue reading “Rejected talk proposal: Design at scale: OpenStack versus Kubernetes”

Share

Accepted talk proposal: Learning from the mistakes that even big projects make

Share

This proposal was submitted for pyconau 2018. It was accepted, but hasn’t been presented yet. The oblique references to OpensStack are because pycon had an “anonymous” review system in 2018, and I was avoiding saying things which directly identified me as the author.


Since 2011, I’ve worked on a large Open Source project in python. It kind of got out of hand – 1000s of developers and millions of lines of code. Yet despite being well resourced, we made the same mistakes that those tiny scripts you whip up to solve a small problem make. Come learn from our fail.

Continue reading “Accepted talk proposal: Learning from the mistakes that even big projects make”

Share