Exploring more efficient remote large file storage

My primary personal project is a thing called Shaken Fist these days — it is an infrastructure as a service cloud akin to OpenStack Compute, but smaller and simpler. Shaken Fist doesn’t have an equivalent to the OpenStack Image service, instead letting your describe your instance images by a standard URL. One of the things Shaken Fist does to be easier to use is it maintains an official repository of common images, which allows users to refer to those images with a shorthand syntax instead of a complete URL. The images also contain small customizations (mainly including the Shaken Fist in-guest agent), which means I can’t just use the official upstream cloud images like OpenStack does.

The images were stored at DreamHost until this week, when a robot decided that they looked like offline backups, despite being served to the Internet via HTTP and being used regularly (although admittedly not frequently). DreamHost unilaterally decided to delete the web site, so now I am looking for new image hosting services, and thinking about better ways to build an image store.

(Oh, and recommending to anyone who asks that they consider using someone less capricious than DreamHost for their hosting needs).

Continue reading “Exploring more efficient remote large file storage”

All python packages require a pyproject.toml with modern pip

So last night Shaken Fist CI jobs started failing with errors like this (editted lightly for clarity):

Building wheels for collected packages: shakenfist-ci
  Building wheel for shakenfist-ci (setup.py): started
  Building wheel for shakenfist-ci (setup.py): finished with status 'error'
  error: subprocess-exited-with-error
  
  × python setup.py bdist_wheel did not run successfully.
  │ exit code: 1
  ╰─> [86 lines of output]
...
      ...setuptools/command/install.py:37: SetuptoolsDeprecationWarning: setup.py install is deprecated. Use build and pip and other standards-based tools.
        setuptools.SetuptoolsDeprecationWarning,
      installing to build/bdist.linux-x86_64/wheel
      running install
...
      warning: install_lib: byte-compiling is disabled, skipping.
      
      running install_egg_info
      Copying shakenfist_ci.egg-info to build/bdist.linux-x86_64/wheel/shakenfist_ci-0.0.1.dev2544-py3.7.egg-info
      running install_scripts
      error: invalid command 'bdist_wininst'
      [end of output]

This was pretty concerning. I know that a setup.py / setup.cfg style install is a little old school, but it was unexpected that it broke entirely. At first I thought I’d have to convert to poetry to unblock this, but Chet helpfully pointed out that this is as simple as adding a pyproject.toml file to the directory which contains your setup.py and setup.cfg. The basic issue is that a modern pip doesn’t assume that you’re going to use setuptools, so you need to tell it that you’re doing that in pyproject.toml. Then you’re unblocked.

So, just create a file named pyproject.toml in the setup.py / setup.cfg directory which contains this:

[build-system]
requires = ["setuptools >= 40.6.0", "wheel"]
build-backend = "setuptools.build_meta"

And you’re good to go. If you’re really curious, this page was quite helpful in working out what was happening.

Debian 10 buster bcrypt pip install breakage

So, as of today by Shaken Fist CI jobs for Debian 10 are failing to install bcrypt, with an error that looks like this:

Running setup.py install for bcrypt: started
    Running setup.py install for bcrypt: finished with status 'error'
    [ ... snip ... ]
    running build_rust
    
        =============================DEBUG ASSISTANCE=============================
        If you are seeing a compilation error please try the following steps to
        successfully install bcrypt:
        1) Upgrade to the latest pip and try again. This will fix errors for most
           users. See: https://pip.pypa.io/en/stable/installing/#upgrading-pip
        2) Ensure you have a recent Rust toolchain installed. bcrypt requires
           rustc >= 1.56.0.
    
        Python: 3.7.3
        platform: Linux-4.19.0-21-amd64-x86_64-with-debian-10.12
        pip: 18.1
        setuptools: 65.2.0
        setuptools_rust: 1.5.1
        rustc: n/a
        =============================DEBUG ASSISTANCE=============================

I’m not really interested in debating why installing a python package requires a rust compiler, that has been dicussed elsewhere.

This specific breakage has been caused by bcrypt releasing 4.0.0, which has this in the changelog: “bcrypt is now implemented in Rust. Users building from source will need to have a Rust compiler available. Nothing will change for users downloading wheels.”

Unfortunately, you can’t just install rustc with apt, as it is both quite big (350mb), and too old (version 1.41.1 versus the required 1.56.0 or better). I also couldn’t find an Ubuntu PPA to misuse to get a more recent rustc.

Another answer here is to use rustup, which is yet another curl to a root shell installer, which isn’t a satisfying answer to me. The other option is of course just to pin bcrypt to pre 4.0.0, but I’d have to do that on every distribution, not just Debian 10 as best as I can tell.

Update: and then I re-read the ChangeLog. It turns out that pip wasn’t offering me wheels because the version of pip was too old. As long as you’re ok with not using an official Debian packaged version of pip, you can do this to get unstuck:

# pip3 install -U pip
# apt-get remove python3-pip
# /usr/local/bin/pip3 install -v bcrypt==4.0.0

Deciding when to filter out large scale refactorings from code analysis

I want to be able to see the level of change between OpenStack releases. However, there are a relatively small number of changes with simply huge amounts of delta in them — they’re generally large refactors or the delete which happens when part of a repository is spun out into its own project.

I therefore wanted to explore what was a reasonable size for a change in OpenStack so that I could decide what maximum size to filter away as likely to be a refactor. After playing with a couple of approaches, including just randomly picking a number, it seems the logical way to decide is to simply plot a histogram of the various sizes, and then pick a reasonable place on the curve as the cutoff. Due to the large range of values (from zero lines of change to over a million!), I ended up deciding a logarithmic axis was the way to go.

For the projects listed in the OpenStack compute starter kit reference set, that produces the following histogram:A histogram of the sizes of various OpenStack commitsI feel that filtering out commits over 10,000 lines of delta feels justified based on that graph. For reference, the raw histogram buckets are:

Commit sizeCount
< 225747
< 11237436
< 101326314
< 1001148865
< 1000116928
< 1000013277
< 1000001522
< 1000000113

A quick summary of OpenStack release tags

I wanted a quick summary of OpenStack git release tags for a talk I am working on, and it turned out to be way more complicated than I expected. I ended up having to compile a table, and then turn that into a code snippet. In case its useful to anyone else, here it is:

ReleaseRelease dateFinal release tag
AustinOctober 20102010.1
BexarFebruary 20112011.1
CactusApril 20112011.2
DiabloSeptember 20112011.3
EssexApril 20122012.1.3
FolsomSeptember 20122012.2.4
GrizzlyApril 20132013.1.5
HavanaOctober 20132013.2.4
IcehouseApril 20142014.1.5
JunoOctober 20142014.2.4
KiloApril 20152015.1.4
LibertyOctober 2015Glance: 11.0.2
Keystone: 8.1.2
Neutron: 7.2.0
Nova: 12.0.6
MitakaApril 2016Glance: 12.0.0
Keystone: 9.3.0
Neutron: 8.4.0
Nova: 13.1.4
NewtonOctober 2016Glance: 13.0.0
Keystone: 10.0.3
Neutron: 9.4.1
Nova: 14.1.0
OcataFebruary 2017Glance: 14.0.1
Keystone: 11.0.4
Neutron: 10.0.7
Nova: 15.1.5
PikeAugust 2017Glance: 15.0.2
Keystone: 12.0.3
Neutron: 11.0.8
Nova: 16.1.8
QueensFebruary 2018Glance: 16.0.1
Keystone: 13.0.4
Neutron: 12.1.1
Nova: 17.0.13
RockyAugust 2018Glance: 17.0.1
Keystone: 14.2.0
Neutron: 13.0.7
Nova: 18.3.0
SteinApril 2019Glance: 18.0.1
Keystone: 15.0.1
Neutron: 14.4.2
Nova: 19.3.2
TrainOctober 2019Glance: 19.0.4
Keystone: 16.0.1
Neutron: 15.3.0
Nova: 20.4.1
UssuriMay 2020Glance: 20.0.1
Keystone: 17.0.0
Neutron: 16.2.0
Nova: 21.1.1
VictoriaOctober 2020Glance: 21.0.0
Keystone: 18.0.0
Neutron: 17.0.0
Nova: 22.0.1

Or in python form for those so inclined:

RELEASE_TAGS = {
    'austin': {'all': '2010.1'},
    'bexar': {'all': '2011.1'},
    'cactus': {'all': '2011.2'},
    'diablo': {'all': '2011.3'},
    'essex': {'all': '2012.1.3'},
    'folsom': {'all': '2012.2.4'},
    'grizzly': {'all': '2013.1.5'},
    'havana': {'all': '2013.2.4'},
    'icehouse': {'all': '2014.1.5'},
    'juno': {'all': '2014.2.4'},
    'kilo': {'all': '2015.1.4'},
    'liberty': {
        'glance': '11.0.2',
        'keystone': '8.1.2',
        'neutron': '7.2.0',
        'nova': '12.0.6'
    },
    'mitaka': {
        'glance': '12.0.0',
        'keystone': '9.3.0',
        'neutron': '8.4.0',
        'nova': '13.1.4'
    },
    'newton': {
        'glance': '13.0.0',
        'keystone': '10.0.3',
        'neutron': '9.4.1',
        'nova': '14.1.0'
    },
    'ocata': {
        'glance': '14.0.1',
        'keystone': '11.0.4',
        'neutron': '10.0.7',
        'nova': '15.1.5'
    },
    'pike': {
        'glance': '15.0.2',
        'keystone': '12.0.3',
        'neutron': '11.0.8',
        'nova': '16.1.8'
    },
    'queens': {
        'glance': '16.0.1',
        'keystone': '13.0.4',
        'neutron': '12.1.1',
        'nova': '17.0.13'
    },
    'rocky': {
        'glance': '17.0.1',
        'keystone': '14.2.0',
        'neutron': '13.0.7',
        'nova': '18.3.0'
    },
    'stein': {
        'glance': '18.0.1',
        'keystone': '15.0.1',
        'neutron': '14.4.2',
        'nova': '19.3.2'
    },
    'train': {
        'glance': '19.0.4',
        'keystone': '16.0.1',
        'neutron': '15.3.0',
        'nova': '20.4.1'
    },
    'ussuri': {
        'glance': '20.0.1',
        'keystone': '17.0.0',
        'neutron': '16.2.0',
        'nova': '21.1.1'
    },
    'victoria': {
        'glance': '21.0.0',
        'keystone': '18.0.0',
        'neutron': '17.0.0',
        'nova': '22.0.1'
    }
}

Writing a terraform remote state server

Terraform is a useful tool for deploying cloud resources. This post isn’t an introduction to terraform, so I’ll assume you already know and love it. If you want more, then this getting started guide would be a sensible start.

At its most basic level, terraform deploys cloud resources and stores information about those resources in a file on local disk called terraform.tfstate — it needs that state information so it can make later changes to the deployment, be those modifying resources in use or tearing the whole deployment down. If you had an operations team working on an environment, then you could store the tfstate file in git or a shared filesystem so that the entire team could manage the deployment. However, there is nothing with that approach that stops two members of the team making overlapping changes.

That’s where terraform state servers come in. State servers can implement optional locking, which stops overlapping operations from happening. The protocol that these servers talk isn’t well documented (that I could find). I wanted to explore that more, so I wrote a simple terraform HTTP state server in python.

To use this state server, configure your terraform file as per demo.tf. The important bits are:

terraform {
  backend "http" {
    address = "http://localhost:5000/terraform_state/4cdd0c76-d78b-11e9-9bea-db9cd8374f3a"
    lock_address = "http://localhost:5000/terraform_lock/4cdd0c76-d78b-11e9-9bea-db9cd8374f3a"
    lock_method = "PUT"
    unlock_address = "http://localhost:5000/terraform_lock/4cdd0c76-d78b-11e9-9bea-db9cd8374f3a"
    unlock_method = "DELETE"
  }
}

Where the URL to the state server will obviously change. The UUID in the URL (4cdd0c76-d78b-11e9-9bea-db9cd8374f3a in this case) is an example of an external ID you might use to correlate the terraform state with the system that requested it be built. It doesn’t have to be a UUID, it can be any string.

I am using PUT and DELETE for locks due to limitations in the HTTP verbs that the python flask framework exposes. You might be able to get away with the defaults in other languages or frameworks.

To run the python server, make a venv, install the dependancies, and then run:

$ python3 -m venv ~/virtualenvs/remote_state
$ . ~/virtualenvs/remote_state/bin/activate
$ pip install -U -r requirements.txt
$ python stateserver.py

I hope someone else finds this useful.

Playing with the python prometheus query API

The last few days have been a bit icky around here, with my house apparently proudly residing in the major city with the dirtiest air in the world. So, I needed a distraction…

It has also been quite hot, so I wondered how my energy usage was going. I have prometheus monitoring of my power draw, so now seemed as good a time as any to learn how to do some historical querying over the API. I ended up with a python script which can output things like this: Yesterday had a maximum temperature of 38 and we used 28.36 kwh. The average for similar days is 25.56 kwh.”

The code is on github if it is of interest to others. I am sure I could push more of this processing down into the prometheus engine, but I couldn’t see how to do it today. Hints welcome!

Using a MCP4921 or MCP4922 as a SPI DAC for Audio on Raspberry Pi

I’ve been playing recently with using a MCP4921 as an audio DAC on a Raspberry Pi Zero W, although a MCP4922 would be equivalent (the ’22 is a two channel DAC, the ’21 is a single channel DAC). This post is my notes on where I got to before I decided that thing wasn’t going to work out for me.

My basic requirement was to be able to play sounds on a raspberry pi which already has two SPI buses in use. Thus, adding a SPI DAC seemed like a logical choice. The basic circuit looked like this:

MCP4921 SPI DAC circuit

Driving this circuit looked like this (noting that this code was a prototype and isn’t the best ever). The bit that took a while there was realising that the CS line needs to be toggled between 16 bit writes. Once that had been done (which meant moving to a different spidev call), things were on the up and up.

This was the point I realised that I was at a dead end. I can’t find a way to send the data to the DAC in a way which respects the timing of the audio file. Before I had to do small writes to get the CS line to toggle I could do things fast enough, but not afterwards. Perhaps there’s a DMA option instead, but I haven’t found one yet.

Instead, I think I’m going to go and try PWM based audio. If that doesn’t work, it will be a MAX219 i2c DAC for me!

Introducing GangScan

As some of you might know, I am a Scout Leader. One of the things I do for Scouts is I assist in a minor role with the running of Canberra Gang Show, a theatre production for young people.

One of the things Gang Show cares about is that they need to be able to do rapid roll calls and reporting on who is present at any given time — this is used for working out who is absent before a performance (and therefore needs an understudy), as well as ensuring we know where everyone is in an environment that sometimes has its fire suppression systems isolated.

Before I came along, Canberra Gang Show was doing this with a Windows based attendance tracking application, and 125kHz RFID tags. This system worked just fine, except that the software was clunky and there was only one badge reader — we struggled explaining to youth that they need to press the “out” button when logging out, and we wanted to be able to have attendance trackers at other locations in the theatre instead of forcing everyone to flow through a single door.

So, I got thinking. How hard can it be to build something a bit better?

Let’s start with some requirements: simple to deploy and manage; free software (both cost and freedom); more badge readers than what we have now; and low cost.

My basic proposal for such a thing is a Raspberry Pi Zero W, with a small LCD screen and a RFID reader. The device scans badges, and displays a confirmation of scan to the user. If the device can talk to a central server it streams events to it; otherwise it queues them until the server is available and then streams them.

Sourcing a simple SPI LCD screen and SPI RFID reader from ebay wasn’t too hard, and we were off! The only real wart was that I wanted to use 13.56mHz RFID cards, because then I could store some interesting (up to 1kb) data on the card itself. The first version was simply a ribbon cable:

v0.0, a ribbon cable

Which then led to me having my first PCB ever made. Let’s ignore that its the wrong size shall we?

v0.1, an incorrectly sized PCB

I’m now at the point where the software for the scanner is reasonable, and there is a bare bones server that does enough roll call that it should be functional. I am sure there’s more to be done, but it works enough to demo. One thing I learned while showing off the device at coffee the other day is that it really needs to make a noise when you scan a badge. I’ve ordered a SPI DAC to play with, which might be the solution there. Other next steps include a newer version of the PCB, and some sort of case solution. I’ll do another post when things progress further.

Oh yes, and I’ll eventually release the software too once its in a more workable state.

HomeAssistant configuration

I’ve recently been playing with HomeAssistant, which is quite cool. Its not perfect — for example it broke recently for me without any debug logs indicating problems because it didn’t want to terminate SSL any more, but its better than anything else I’ve seen so far.

Along the way its been super handy to be able to refer to other people’s HomeAssistant configurations to see how they got things working. So in that spirit, here’s my current configuration with all of the secrets pulled out. Its not the most complicated config, but it does do some things which took me a while to get working. Some examples:

  • The Roomba runs when no one is home, and let’s me know when its bin is full.
  • A custom component to track when events last occurred so that I can rate limit things like how often the Roomba runs when no one is home.
  • I detect when my wired doorbell goes off, and play a “ding dong” MP3 in the office yurt out the back so I know when someone is visiting.
  • …and probably other things.

I intend to write up interesting things as I think of them, but we’ll see how we go with that.