Exploring more efficient remote large file storage

My primary personal project is a thing called Shaken Fist these days — it is an infrastructure as a service cloud akin to OpenStack Compute, but smaller and simpler. Shaken Fist doesn’t have an equivalent to the OpenStack Image service, instead letting your describe your instance images by a standard URL. One of the things Shaken Fist does to be easier to use is it maintains an official repository of common images, which allows users to refer to those images with a shorthand syntax instead of a complete URL. The images also contain small customizations (mainly including the Shaken Fist in-guest agent), which means I can’t just use the official upstream cloud images like OpenStack does.

The images were stored at DreamHost until this week, when a robot decided that they looked like offline backups, despite being served to the Internet via HTTP and being used regularly (although admittedly not frequently). DreamHost unilaterally decided to delete the web site, so now I am looking for new image hosting services, and thinking about better ways to build an image store.

(Oh, and recommending to anyone who asks that they consider using someone less capricious than DreamHost for their hosting needs).

Continue reading “Exploring more efficient remote large file storage”

Ansible 7.0 onwards requires blocking IO from stdin, stdout, and stderr

Shaken Fist CI started failing this afternoon with this message logged:

ERROR: Ansible requires blocking IO on stdin/stdout/stderr.
Non-blocking file handles detected: <stdout>

Specifically this was happening when using ansible-galaxy to install some requirements, but the check is a more generic check than that was implemented by this ansible pull request, which appears to have been released with ansible-core 2.14 on November 8. That sat around until today, when ansible 7.0.0 was released and broke CI for me.

To be completely honest I’m not sure what’s happening here — somewhere in GitHub actions calling a shell script that calls ansible-galaxy the stdout file descriptor gets set to non-blocking and everything breaks. I’m unsure exactly where because its a pain to track down.

That said, Jack came to the rescue with this gem:

ansible-galaxy install andrewrothstein.etcd-cluster | cat -

Which unblocks me. It will be interesting to see if other people encounter problems with this change.

All python packages require a pyproject.toml with modern pip

So last night Shaken Fist CI jobs started failing with errors like this (editted lightly for clarity):

Building wheels for collected packages: shakenfist-ci
  Building wheel for shakenfist-ci (setup.py): started
  Building wheel for shakenfist-ci (setup.py): finished with status 'error'
  error: subprocess-exited-with-error
  
  × python setup.py bdist_wheel did not run successfully.
  │ exit code: 1
  ╰─> [86 lines of output]
...
      ...setuptools/command/install.py:37: SetuptoolsDeprecationWarning: setup.py install is deprecated. Use build and pip and other standards-based tools.
        setuptools.SetuptoolsDeprecationWarning,
      installing to build/bdist.linux-x86_64/wheel
      running install
...
      warning: install_lib: byte-compiling is disabled, skipping.
      
      running install_egg_info
      Copying shakenfist_ci.egg-info to build/bdist.linux-x86_64/wheel/shakenfist_ci-0.0.1.dev2544-py3.7.egg-info
      running install_scripts
      error: invalid command 'bdist_wininst'
      [end of output]

This was pretty concerning. I know that a setup.py / setup.cfg style install is a little old school, but it was unexpected that it broke entirely. At first I thought I’d have to convert to poetry to unblock this, but Chet helpfully pointed out that this is as simple as adding a pyproject.toml file to the directory which contains your setup.py and setup.cfg. The basic issue is that a modern pip doesn’t assume that you’re going to use setuptools, so you need to tell it that you’re doing that in pyproject.toml. Then you’re unblocked.

So, just create a file named pyproject.toml in the setup.py / setup.cfg directory which contains this:

[build-system]
requires = ["setuptools >= 40.6.0", "wheel"]
build-backend = "setuptools.build_meta"

And you’re good to go. If you’re really curious, this page was quite helpful in working out what was happening.

Debian 10 buster bcrypt pip install breakage

So, as of today by Shaken Fist CI jobs for Debian 10 are failing to install bcrypt, with an error that looks like this:

Running setup.py install for bcrypt: started
    Running setup.py install for bcrypt: finished with status 'error'
    [ ... snip ... ]
    running build_rust
    
        =============================DEBUG ASSISTANCE=============================
        If you are seeing a compilation error please try the following steps to
        successfully install bcrypt:
        1) Upgrade to the latest pip and try again. This will fix errors for most
           users. See: https://pip.pypa.io/en/stable/installing/#upgrading-pip
        2) Ensure you have a recent Rust toolchain installed. bcrypt requires
           rustc >= 1.56.0.
    
        Python: 3.7.3
        platform: Linux-4.19.0-21-amd64-x86_64-with-debian-10.12
        pip: 18.1
        setuptools: 65.2.0
        setuptools_rust: 1.5.1
        rustc: n/a
        =============================DEBUG ASSISTANCE=============================

I’m not really interested in debating why installing a python package requires a rust compiler, that has been dicussed elsewhere.

This specific breakage has been caused by bcrypt releasing 4.0.0, which has this in the changelog: “bcrypt is now implemented in Rust. Users building from source will need to have a Rust compiler available. Nothing will change for users downloading wheels.”

Unfortunately, you can’t just install rustc with apt, as it is both quite big (350mb), and too old (version 1.41.1 versus the required 1.56.0 or better). I also couldn’t find an Ubuntu PPA to misuse to get a more recent rustc.

Another answer here is to use rustup, which is yet another curl to a root shell installer, which isn’t a satisfying answer to me. The other option is of course just to pin bcrypt to pre 4.0.0, but I’d have to do that on every distribution, not just Debian 10 as best as I can tell.

Update: and then I re-read the ChangeLog. It turns out that pip wasn’t offering me wheels because the version of pip was too old. As long as you’re ok with not using an official Debian packaged version of pip, you can do this to get unstuck:

# pip3 install -U pip
# apt-get remove python3-pip
# /usr/local/bin/pip3 install -v bcrypt==4.0.0

Shaken Fist v0.4.2

Shaken Fist v0.4.2 snuck out yesterday as part of shooting this tutorial video. That’s because I really wanted to demonstrate floating IPs, which I only recently got working nicely. Overall in v0.4.2 we:

  • Improved CI for image API calls.
  • Improved upgrade CI testing.
  • Improved network state tracking.
  • Floating IPs now work, and have covering CI. shakenfist#257
  • Resolve leaks of floating IPs from both direct use and NAT gateways. shakenfist#256
  • Resolve leaks of IPManagers on network delete. shakenfist#675
  • Use system packages for ansible during install.

Starting your first instance on Shaken Fist (a video tutorial)

As a bit of an experiment, I’ve made this quick and dirty “vlog” style tutorial video to show you how to install Shaken Fist on a single machine and boot your first instance. I demonstrate how to install, setup your first virtual network, start the instance, inspect events that the instance has experienced, and then log in.

Let me know if you think its useful.

Shaken Fist 0.4.1

I don’t blog about every Shaken Fist release here, but I do feel like the 0.4 release (and the subsequent minor bug fix release 0.4.1) are a pretty big deal in the life of the project.

Shaken Fist logo
We also got a cool logo during the v0.4 cycle as well.

The focus of the v0.4 series is reliability — we’ve used behaviour in the continuous integration pipeline as a proxy for that, but it should be a significant improvement in the real world as well. This has included:

  • much more extensive continuous integration coverage, including several new jobs.
  • checksumming image downloads, and retrying images where the checksum fails.
  • reworked locking.
  • etcd reliability improvements.
  • refactoring instances and networks to a new “non-volatile” object model where only immutable values are cached.
  • images now track a state much like instances and networks.
  • a reworked state model for instances, where its clearer why an instance ended up in an error state. This is documented in our developer docs.

In terms of new features, we also added:

  • a network ping API, which will emit ICMP ping packets on the network node onto your virtual network. We use this in testing to ensure instances booted and ended up online.
  • networks are now checked to ensure that they have a reasonable minimum size.
  • addition of a simple etcd backup and restore tool (sf-backup).
  • improved data upgrade of previous installations.
  • VXLAN ids are now randomized, and this has forced a new naming scheme for network interfaces and bridges.
  • we are smarter about what networks we restore on startup, and don’t restore dead networks.

We also now require python 3.8.

Overall, Shaken Fist v0.4 is a place that makes me much more comfortable to run workloads I care about on that previous releases. Its far from perfect, but we’re definitely moving in the right direction.

Rejected talk proposal: Shaken Fist, thought experiments in simpler IaaS clouds

This proposal was submitted for FOSDEM 2021. Given that acceptances were meant to be sent out on 25 December and its basically a week later I think we can assume that its been rejected. I’ve recently been writing up my rejected proposals, partially because I’ve put in the effort to write them and they might be useful elsewhere, but also because I think its important to demonstrate that its not unusual for experienced speakers to be rejected from these events.


OpenStack today is a complicated beast — not only does it try to perform well for large clusters, but it also embraces a diverse set of possible implementations from hypervisors, storage, networking, and more. This was a deliberate tactical choice made by the OpenStack community years ago, forming a so called “Big Tent” for vendors to collaborate in to build Open Source cloud options. It made a lot of sense at the time to be honest. However, OpenStack today finds itself constrained by the large number of permutations it must support, ten years of software and backwards compatability legacy, and a decreasing investment from those same vendors that OpenStack courted so actively.

Shaken Fist makes a series of simplifying assumptions that allow it to achieve a surprisingly large amount in not a lot of code. For example, it supports only one hypervisor, one hypervisor OS, one networking implementation, and lacks an image service. It tries hard to be respectful of compute resources while idle, and as fast as possible to deploy resources when requested — its entirely possible to deploy a new VM and start it booting in less than a second for example (if the boot image is already held in cache). Shaken Fist is likely a good choice for small deployments such as home labs and telco edge applications. It is unlikely to be a good choice for large scale compute however.

Shaken Fist 0.2.0

The other day we released Shaken Fist version 0.2, and I never got around to announcing it here. In fact, we’ve done a minor release since then and have another minor release in the wings ready to go out in the next day or so.

So what’s changed in Shaken Fist between version 0.1 and 0.2? Well, actually kind of a lot…

  • We moved from MySQL to etcd for storage of persistant state. This was partially done because we wanted distributed locking, but it was also because MySQL was a pain to work with.
  • We rearranged our repositories — the main repository is now in its own github organisation, and the golang REST client, terrform provider, and deployment tooling have moved into their own repositories in that organisation. There is also a prototype javascript client now as well.
  • Some work has gone into making the API service more production grade, although there is still some work to be done there probably in the 0.3 release — specifically there is a timeout if a response takes more than 300 seconds, which can be the case in launch large VMs where the disk images are not in cache.

There were also some important features added:

  • Authentication of API requests.
  • Resource ownership.
  • Namespaces (a bit like Kubernetes namespaces or OpenStack projects).
  • Resource tagging, called metadata.
  • Support for local mirroring of common disk images.
  • …and a large number of bug fixes.

Shaken Fist is also now packaged on pypi, and the deployment tooling knows how to install from packages as well as source if that’s a thing you’re interested in. You can read more at shakenfist.com, but that site is a bit of a work in progress at the moment. The new github organisation is at github.com/shakenfist.

The KSM and I

I spent much of yesterday playing with KSM (Kernel Shared Memory, or Kernel Samepage Merging depending on which universe you come from). Unix kernels store memory in “pages” which are moved in and out of memory as a single block. On most Linux architectures pages are 4,096 bytes long.

KSM is a Linux Kernel feature which scans memory looking for identical pages, and then de-duplicating them. So instead of having two pages, we just have one and have two processes point at that same page. This has obvious advantages if you’re storing lots of repeating data. Why would you be doing such a thing? Well the traditional answer is virtual machines.

Take my employer’s systems for example. We manage virtual learning environments for students, where every student gets a set of virtual machines to do their learning thing on. So, if we have 50 students in a class, we have 50 sets of the same virtual machine. That’s a lot of duplicated memory. The promise of KSM is that instead of storing the same thing 50 times, we can store it once and therefore fit more virtual machines onto a single physical machine.

For my experiments I used libvirt / KVM on Ubuntu 18.04. To ensure KSM was turned on, I needed to:

  • Ensure KSM is turned on. /sys/kernel/mm/ksm/run should contain a “1” if it is enabled. If it is not, just write “1” to that file to enable it.
  • Ensure libvirt is enabling KSM. The KSM value in /etc/defaults/qemu-kvm should be set to “AUTO”.
  • Check KSM metrics:
# grep . /sys/kernel/mm/ksm/*
/sys/kernel/mm/ksm/full_scans:891
/sys/kernel/mm/ksm/max_page_sharing:256
/sys/kernel/mm/ksm/merge_across_nodes:1
/sys/kernel/mm/ksm/pages_shared:0
/sys/kernel/mm/ksm/pages_sharing:0
/sys/kernel/mm/ksm/pages_to_scan:100
/sys/kernel/mm/ksm/pages_unshared:0
/sys/kernel/mm/ksm/pages_volatile:0
/sys/kernel/mm/ksm/run:1
/sys/kernel/mm/ksm/sleep_millisecs:200
/sys/kernel/mm/ksm/stable_node_chains:49
/sys/kernel/mm/ksm/stable_node_chains_prune_millisecs:2000
/sys/kernel/mm/ksm/stable_node_dups:1055
/sys/kernel/mm/ksm/use_zero_pages:0

My lab machines are currently setup with Shaken Fist, so I just quickly launched a few hundred identical VMs. This first graph is that experiment. Its a little hard to see here but on three machines I consumed about about 40gb of RAM with indentical VMs and then waited. After three or so hours I had saved about 2,500 pages of memory.

To be honest, that’s a pretty disappointing result. 2,5000 4kb pages is only about 10mb of RAM, which isn’t very much at all. Also, three hours is a really long time for our workload, where students often fire up their labs for a couple of hours at a time before shutting them down again. If this was as good as KSM gets, it wasn’t for us.

After some pondering, I realised that KSM is configured by default to not work very well. The default value for pages_to_scan is 100, which means each scan run only inspects about half a megabyte of RAM. It would take a very very long time to scan a modern machine that way. So I tried setting pages_to_scan to 1,000,000,000 instead. One billion is an unreasonably large number for the real world, but hey. You update this number by writing a new value to /sys/kernel/mm/ksm/pages_to_scan.

This time we get a much better result — I launched as many VMs as would fit on each machine, and the sat back and waited (well, went to bed acutally). Again the graph is a bit hard to read, but what it is saying is that after 90 minutes KSM had saved me over 300gb of RAM across the three machines. Its still a little too slow for our workload, but for workloads where the VMs are relatively static that’s a real saving.

Now it should be noted that setting pages_to_scan to 1,000,000,000 comes at a cost — each of these machines now has one of its 48 cores dedicated to scanning memory and deduplicating. For my workload that’s something I am ok with because my workload is not CPU bound, but it might not work for you.