Gang Scan is an open source (and free) attendance tracking system based on custom RFID reader boards that communicate back to a server over wifi. The boards are capable of queueing scan events in the case of intermittent network connectivity, and the server provides simple reporting.
The other day we released Shaken Fist version 0.2, and I never got around to announcing it here. In fact, we’ve done a minor release since then and have another minor release in the wings ready to go out in the next day or so.
So what’s changed in Shaken Fist between version 0.1 and 0.2? Well, actually kind of a lot…
- We moved from MySQL to etcd for storage of persistant state. This was partially done because we wanted distributed locking, but it was also because MySQL was a pain to work with.
- Some work has gone into making the API service more production grade, although there is still some work to be done there probably in the 0.3 release — specifically there is a timeout if a response takes more than 300 seconds, which can be the case in launch large VMs where the disk images are not in cache.
There were also some important features added:
- Authentication of API requests.
- Resource ownership.
- Namespaces (a bit like Kubernetes namespaces or OpenStack projects).
- Resource tagging, called metadata.
- Support for local mirroring of common disk images.
- …and a large number of bug fixes.
Shaken Fist is also now packaged on pypi, and the deployment tooling knows how to install from packages as well as source if that’s a thing you’re interested in. You can read more at shakenfist.com, but that site is a bit of a work in progress at the moment. The new github organisation is at github.com/shakenfist.
I spent much of yesterday playing with KSM (Kernel Shared Memory, or Kernel Samepage Merging depending on which universe you come from). Unix kernels store memory in “pages” which are moved in and out of memory as a single block. On most Linux architectures pages are 4,096 bytes long.
KSM is a Linux Kernel feature which scans memory looking for identical pages, and then de-duplicating them. So instead of having two pages, we just have one and have two processes point at that same page. This has obvious advantages if you’re storing lots of repeating data. Why would you be doing such a thing? Well the traditional answer is virtual machines.
Take my employer’s systems for example. We manage virtual learning environments for students, where every student gets a set of virtual machines to do their learning thing on. So, if we have 50 students in a class, we have 50 sets of the same virtual machine. That’s a lot of duplicated memory. The promise of KSM is that instead of storing the same thing 50 times, we can store it once and therefore fit more virtual machines onto a single physical machine.
For my experiments I used libvirt / KVM on Ubuntu 18.04. To ensure KSM was turned on, I needed to:
- Ensure KSM is turned on. /sys/kernel/mm/ksm/run should contain a “1” if it is enabled. If it is not, just write “1” to that file to enable it.
- Ensure libvirt is enabling KSM. The KSM value in /etc/defaults/qemu-kvm should be set to “AUTO”.
- Check KSM metrics:
# grep . /sys/kernel/mm/ksm/* /sys/kernel/mm/ksm/full_scans:891 /sys/kernel/mm/ksm/max_page_sharing:256 /sys/kernel/mm/ksm/merge_across_nodes:1 /sys/kernel/mm/ksm/pages_shared:0 /sys/kernel/mm/ksm/pages_sharing:0 /sys/kernel/mm/ksm/pages_to_scan:100 /sys/kernel/mm/ksm/pages_unshared:0 /sys/kernel/mm/ksm/pages_volatile:0 /sys/kernel/mm/ksm/run:1 /sys/kernel/mm/ksm/sleep_millisecs:200 /sys/kernel/mm/ksm/stable_node_chains:49 /sys/kernel/mm/ksm/stable_node_chains_prune_millisecs:2000 /sys/kernel/mm/ksm/stable_node_dups:1055 /sys/kernel/mm/ksm/use_zero_pages:0
My lab machines are currently setup with Shaken Fist, so I just quickly launched a few hundred identical VMs. This first graph is that experiment. Its a little hard to see here but on three machines I consumed about about 40gb of RAM with indentical VMs and then waited. After three or so hours I had saved about 2,500 pages of memory.
To be honest, that’s a pretty disappointing result. 2,5000 4kb pages is only about 10mb of RAM, which isn’t very much at all. Also, three hours is a really long time for our workload, where students often fire up their labs for a couple of hours at a time before shutting them down again. If this was as good as KSM gets, it wasn’t for us.
After some pondering, I realised that KSM is configured by default to not work very well. The default value for pages_to_scan is 100, which means each scan run only inspects about half a megabyte of RAM. It would take a very very long time to scan a modern machine that way. So I tried setting pages_to_scan to 1,000,000,000 instead. One billion is an unreasonably large number for the real world, but hey. You update this number by writing a new value to /sys/kernel/mm/ksm/pages_to_scan.
This time we get a much better result — I launched as many VMs as would fit on each machine, and the sat back and waited (well, went to bed acutally). Again the graph is a bit hard to read, but what it is saying is that after 90 minutes KSM had saved me over 300gb of RAM across the three machines. Its still a little too slow for our workload, but for workloads where the VMs are relatively static that’s a real saving.
Now it should be noted that setting pages_to_scan to 1,000,000,000 comes at a cost — each of these machines now has one of its 48 cores dedicated to scanning memory and deduplicating. For my workload that’s something I am ok with because my workload is not CPU bound, but it might not work for you.
The first public commit to what would become OpenStack Nova was made ten years ago today — at Thu May 27 23:05:26 2010 PDT to be exact. So first off, happy tenth birthday to Nova!
A lot has happened in that time — OpenStack has gone from being two separate Open Source projects to a whole ecosystem, developers have come and gone (and passed away), and OpenStack has weathered the cloud wars of the last decade. OpenStack survived its early growth phase by deliberately offering a “big tent” to the community and associated vendors, with an expansive definition of what should be included. This has resulted in most developers being associated with a corporate sponser, and hence the decrease in the number of developers today as corporate interest wanes — OpenStack has never been great at attracting or retaining hobbist contributors.
My personal involvement with OpenStack started in November 2011, so while I missed the very early days I was around for a lot and made many of the mistakes that I now see in OpenStack.
What do I see as mistakes in OpenStack in hindsight? Well, embracing vendors who later lose interest has been painful, and has increased the complexity of the code base significantly. Nova itself is now nearly 400,000 lines of code, and that’s after splitting off many of the original features of Nova such as block storage and networking. Additionally, a lot of our initial assumptions are no longer true — for example in many cases we had to write code to implement things, where there are now good libraries available from third parties.
That’s not to say that OpenStack is without value — I am a daily user of OpenStack to this day, and use at least three OpenStack public clouds at the moment. That said, OpenStack is a complicated beast with a lot of legacy that makes it hard to maintain and slow to change.
For at least six months I’ve felt the desire for a simpler cloud orchestration layer — both for my own personal uses, and also as a test bed for ideas for what a smaller, simpler cloud might look like. My personal use case involves a relatively small environment which echos what we now think of as edge compute — less than 10 RU of machines with a minimum of orchestration and management overhead.
At the time that I was thinking about these things, the Australian bushfires and COVID-19 came along, and presented me with a lot more spare time than I had expected to have. While I’m still blessed to be employed, all of my social activities have been cancelled, so I find myself at home at a loose end on weekends and evenings at lot more than before.
Thus Shaken Fist was born — named for a Simpson’s meme, Shaken Fist is a deliberately small and highly opinionated cloud implementation aimed at working well in small deployments such as homes, labs, edge compute locations, deployed systems, and so forth.
I’d taken a bit of trouble with each feature in Shaken Fist to think through what the simplest and highest value way of doing something is. For example, instances always get a config drive and there is no metadata server. There is also only one supported type of virtual networking, and one supported hypervisor. That said, this means Shaken Fist is less than 5,000 lines of code, and small enough that new things can be implemented very quickly by a single middle aged developer.
Shaken Fist definitely has feature gaps — API authentication and scheduling are the most obvious at the moment — but I have plans to fill those when the time comes.
I’m not sure if Shaken Fist is useful to others, but you never know. Its apache2 licensed, and available on github if you’re interested.
This is the third in a series of posts documenting my adventures in making bread during the COVID-19 shutdown. I’d like to imagine I was running science experiments in making bread on my kids, but really all I was trying to do was eat some toast.
I’m not sure what it was like in other parts of the world, but during the COVID-19 pandemic Australia suffered a bunch of shortages — toilet paper, flour, and yeast were among those things stores simply didn’t have any stock of. Luckily we’d only just done a costco shop so were ok for toilet paper and flour, but we were definitely getting low on yeast. The obvious answer is a sour dough starter, but I’d never done that thing before.
In the end my answer was to cheat and use this recipe. However, I found the instructions unclear, so here’s what I ended up doing:
- 2 cups of warm water
- 2 teaspoons of dry yeast
- 2 cups of bakers flour
Mix these three items together in a plastic container with enough space for the mix to double in size. Place in a warm place (on the bench on top of the dish washer was our answer), and cover with cloth secured with a rubber band.
Once a day you should feed your starter with 1 cup of flour and 1 cup of warm water. Stir throughly.
The recipe online says to feed for five days, but the size of my starter was getting out of hand by a couple of days, so I started baking at that point. I’ll describe the baking process in a later post. The early loaves definitely weren’t as good as the more recent ones, but they were still edible.
Once the starter is going, you feed daily and probably need to bake daily to keep the starters size under control. That obviously doesn’t work so great if you can’t eat an entire loaf of bread a day. You can hybernate the starter by putting it in the fridge, which means you only need to feed it once a week.
To wake a hybernated starter up, take it out of the fridge and feed it. I do this at 8am. That means I can then start the loaf for baking at about noon, and the starter can either go back in the fridge until next time or stay on the bench being fed daily.
I have noticed that sometimes the starter comes out of the fridge with a layer of dark water on top. Its worked out ok for us to just ignore that and stir it into the mix as part of the feeding process. Hopefully we wont die.
This is the second in a series of posts documenting my adventures in making bread during the COVID-19 shutdown. Yes I know all the cool kids made bread for themselves during the shutdown, but I did it too!
So here we were, in the middle of a pandemic which closed bakeries and cancelled almost all of my non-work activities. I found this animated GIF on Reddit for a super simple no-kneed bread and decided to give it a go. It turns out that a few things are true:
- animated GIFs are a super terrible way store recipes
- that animated GIF was a export of this YouTube video which originally accompanied this blog post
- and that I only learned these things while to trying and work out who to credit for this recipe
The basic recipe is really easy — chuck the following into a big bowl, stir, and then cover with a plate. Leave resting a warm place for a long time (three or four hours), then turn out onto a floured bench. Fold into a ball with flour, and then bake. You can see a more detailed version in the YouTube video above.
- 3 cups of bakers flour (not plain white flour)
- 2 tea spoons of yeast
- 2 tea spooons of salt
- 1.5 cups of warm water (again, I use 42 degrees from my gas hot water system)
The dough will seem really dry when you first mix it, but gets wetter as it rises. Don’t panic if it seems tacky and dry.
I think the key here is the baking process, which is how the oven loaf in my previous post about bread maker white loaves was baked. I use a cast iron camp oven (sometimes called a dutch oven), because thermal mass is key. If I had a fancy enamelized cast iron camp oven I’d use that, but I don’t and I wasn’t going shopping during the shutdown to get one. Oh, and they can be crazy expensive at up to $500 AUD.
Warm the oven with the camp oven inside for at least 30 minutes at 230 degrees celsius. Then place the dough inside the camp oven on some baking paper — I tend to use a triffet as well, but I think you could skip that if you didn’t have one. Bake for 30 minutes with the lid on — this helps steam the bread a little and forms a nice crust. Then bake for another 12 minutes with the camp over lid off — this darkens the crust up nicely.
Oh, and I’ve noticed a bit of variation in how wet the dough seems to be when I turn it out and form it in flour, but it doesn’t really seem to change the outcome once baked, so that’s nice.
The original blogger for this receipe also recommends chilling the dough overnight in the fridge before baking, but I haven’t tried that yet.
My dad asked me to document some of my baking experiments from the recent natural disasters, which I wanted to do anyway so that I could remember the recipes. Its taken me a while to get around to though, because animated GIFs on reddit are a terrible medium for recipe storage, and because I’ve been distracted with other shiney objects. That said, let’s start with the basics — a breadmaker bread that my kids will actually eat.
This recipe took a bunch of iterations to get right over the last year or so, but I’ll spare you the long boring details. However, I suspect part of the problem is that the receipe varies by bread maker. Oh, and the salt is really important — don’t skip the salt!
Wet ingredients (add first)
- 1.5 cups of warm water (we have an instantaneous gas hot water system, so I pick 42 degrees)
- 0.25 cups of oil (I use bran oil)
Dry ingredients (add second)
I just kind of chuck these in, although I tend to put the non-flour ingredients in a corner together for reasons that I can’t explain.
- 3.5 cups of bakers flour (must be bakers flour, not plain flour)
- 2 tea spoons of instant yeast (we keep in the freezer in a big packet, not the sashets)
- 4 tea spoons of white sugar
- 1 tea spoon of salt
- 2 tea spoons of bread improver
I then just let my bread maker do its thing, which takes about three hours including baking. If I am going to bake the bread in the over, then the dough takes about two hours, but I let the dough rise for another 30 to 60 minutes before baking.
I think to be honest that the result is better from the oven, but a little more work. The bread maker loaves are a bit prone to collapsing (you can see it starting on the example above), and there is a big kneeding hook indent in the middle of the bottom of the loaf.
The oven baking technique took a while to develop, but I’ll cover that in a later post.
Today I wandered into a bit of a rat hole discovering how to export data from OpenStack Cinder volumes when you don’t have admin permissions, and I thought it was worth documenting here so I remember it for next time.
Let’s assume that you have a Cinder volume named “child1”, which is a 64gb volume originally cloned from “parent1”. parent1 is a 7.9gb VMDK, but the only way I can find to extract child1 is to convert it to a glance image and then download the entire volume as a raw. Something like this:
$ cinder upload-to-image $child1 "extract:$child1"
Where $child1 is the UUID of the Cinder volume. You then need to find the UUID of the image in Glance, which the Cinder upload-to-image command will have told you, but you can also find by searching Glance for your image named “extract:$child1”:
$ glance image-list | grep "extract:$cinder_uuid"
You now need to watch that Glance image until the status of the image is “active”. It will go through a series of steps with names like “queued”, and “uploading” first.
Now you can download the image from Glance:
$ glance image-download --file images/$child1.raw --progress $glance_uuid
And then delete the intermediate glance image:
$ glance image-delete $glance_uuid
I have a bad sample script which does this in my junk code repository if that is helpful.
What you have at the end of this is a 64gb raw disk file in my example. You can convert that file to qcow2 like this:
$ qemu-img convert $child1.raw $child1.qcow2
But you’re left with a 64gb qcow2 file for your troubles. I experimented with virt-sparsify to reduce the size of this image, but it doesn’t work in my case (no space is saved), I suspect because the disk image has multiple partitions because it originally came from a VMWare environment.
Luckily qemu-img can also re-create the COW layer that existing on the admin-only side of the public cloud barrier. You do this by rebasing the converted qcow2 file onto the original VMDK file like this:
$ qemu-img create -f qcow2 -b $parent1.qcow2 $child1.delta.qcow2 $ qemu-img rebase -b $parent1.vmdk $child1.delta.qcow2
In my case I ended up with a 289mb $child1.delta.qcow2 file, which isn’t too shabby. It took about five minutes to produce that delta on my Google Cloud instance from a 7.9gb backing file and a 64gb upper layer.
Winner of both a Hugo, Locus and a Nebula, this book is about a mathematical prodigy battling her way into a career as an astronaut in a post-apolocalyptic 1950s America. Along the way she has to take on the embedded sexism of America in the 50s, as well as her own mild racism. Worse, she suffers from an anxiety condition.
The book is engaging and well written, with an alternative history plot line which believable and interesting. In fact, its quite topical for our current time.
I really enjoyed this book and I will definitely be reading the sequel.
I have a need at the moment to know where my users are in the world. This helps me to identify what compute resources to serve their request with in order to reduce the latency they experience. So how do you do that thing with Google Cloud?
The first step is to setup a series of test backends to send traffic to. I built three regions: Sydney; London; and Los Angeles. It turns out in hindsight that wasn’t actually nessesary though — this would work with a single backend just as well. For my backends I chose a minimal Ubuntu install, running this simple backend HTTP service.
I had some initial trouble finding a single page which walked through the setup of the Google Cloud load balancer to do what I wanted, which is the main reason for writing this post. The steps are:
Create your test instances and configure the backend on them. I ended up with a setup like this:
Next setup instance groups to contain these instances. I chose unmanaged instance groups (that is, I don’t want autoscaling). You need to create one per region.
But wait! There’s one more layer of abstraction. We need a backend service. The configuration for these is cunningly hidden on the load balancing page, on a separate tab. Create a service which contains our three instance groups:
I’ve also added a health check to my service, which just requests “/healthz” from each instance and expects a response of “OK” for healthy backends.
The backend service is also where we configure our extra headers. Click on the “advanced configurations” link, and more options appear:
Here I setup the extra HTTP headers the load balancer should insert: X-Region; X-City; and X-Lat-Lon.
And finally we can configure the load balancer. I selected a “HTTP(S) load balancer”, as I only care about incoming HTTP and HTTPS traffic. Obviously you set the load balancer to route traffic from the Internet to your VMs, and you wire the backend of the load balancer to your service. Select your backend service for the backend.
Now we can test! If I go to my load balancer in a web browser, I now get a result like this:
The top part of the page is just the HTTP headers from the request. You can see that we’re now getting helpful location headers. Mission accomplished!
This post is an attempt to collect a set of general hints and tips for resumes and interviews. It is not concrete truth though, like all things this process is subjective and will differ from place to place. It originally started as a Google doc shared around a previous workplace during some layoffs, but it seems more useful than that so I am publishing it publicly.
I’d welcome comments if you think it will help others.
So something bad happened
I have the distinction of having been through layoffs three times now. I think there are some important first steps:
- Take a deep breath.
- Hug your loved ones and then go and sweat on something — take a walk, go to the gym, whatever works for you. Research shows that exercise is a powerful mood stabiliser.
- Make a plan. Who are you going to apply with? Who could refer you? What do you want to do employment wise? Updating your resume is probably a good first step in that plan.
- Treat finding a job as your job. You probably can’t do it for eight hours a day, but it should be your primary goal for each “workday”. Have a todo list, track things on that list, and keep track of status.
And remember, being laid off isn’t about you, it is about things outside your control. Don’t take it as a reflection on your abilities.
- The goal of a resume is to get someone to want to interview you. It is not meant to be a complete description of everything you’ve done. So, keep it short and salesy (without lying through oversimplification!).
- Resumes are also cultural — US firms tend to expect short summary (two pages), Australian firms seem to expect something longer and more detailed. So, ask your friends if you can see their resumes to get a sense of the right style for the market you’re operating in. It is possible you’ll end up with more than one version if you’re applying in two markets at once.
- Speaking of friends, referrals are gold. Perhaps look through your LinkedIn and other social media and see where people you’ve formerly worked with are now. If you have a good reputation with someone and they’re somewhere cool, ask them to refer you for a job. It might not work, but it can’t hurt.
- Ratings for skills on LinkedIn help recruiters find you. So perhaps rate your friends for things you think they’re good at and then ask them to return the favour?
Interviews in general
The soft interview questions we all get asked:
- I would expect to be asked what I’ve done in my career — an “introduce yourself” moment. So try and have a coherent story that is short but interesting — “I’m a system admin who has been working on cloud orchestration and software defined networking for Australia’s largest telco” for example.
- You will probably be asked why you’re looking for work too. I think there’s no shame in honesty here, something like “I worked for a small systems integrator that did amazing things, but the main customer has been doing large layoffs and stopped spending”.
- You will also probably be asked why you want this job / want to work with this company. While everyone really knows it is because you enjoy having money, find other things beforehand to say instead. “I want to work with Amazon because I love cloud, Amazon is kicking arse in that space, and I hear you have great people I’d love to work with”.
Note here: the original version of the above point said “I’d love to learn from”, but it was mentioned on Facebook that the flow felt one way there. It has been tweaked to express a desire for a two way flow of learning.
“What have you done” questions: the reality is that almost all work is collaborative these days. So, have some stories about things you’ve personally done and are proud of, but also have some stories of delivering things bigger than one person could do. For example, perhaps the ansible scripts for your project were super cool and mostly you, but perhaps you should also describe how the overall project was important and wouldn’t have worked without your bits.
Silicon Valley interviews: organizations like Google, Facebook, et cetera want to be famous for having hard interviews. Google will deliberately probe until they find an area you don’t know about and then dig into that. Weirdly, they’re not doing that to be mean — they’re trying to gauge how you respond to new situations (and perhaps stress). So, be honest if you don’t know the answer, but then offer up an informed guess. For example, I used to ask people about system calls and strace. We’d keep going until we hit the limit of what they understood. I’d then stop and explain the next layer and then ask them to infer things — “assuming that things work like this, how would this probably work”? It is important to not panic!
Interviews as a sysadmin
- Interviewers want to know about your attitude as well as your skills. As sysadmins, sometimes we are faced with high pressure situations — something is down and we need to get it back up and running ASAP. Have a story ready to tell about a time something went wrong. You should demonstrate that you took the time to plan before acting, even in an emergency scenario. Don’t leave the interviewer thinking you’ll be the guy who will accidentally delete everyone’s data because you’re in a rush.
- An understanding of how the business functions and why “IT” is important is needed. For example, if you get asked to explain what a firewall is, be sure to talk about how it relates to “security policy” as well as the technical elements (ports, packet inspection & whatnot).
- Your ability to learn new technologies is as important as the technologies you already know.
Interviews as a developer
- I think people look for curiosity here. Everyone will encounter new things, so they want to hear that you like learning, are a self starter, and can do new stuff. So for example if you’ve just done the CKA exam and passed that would be a great example.
- You need to have examples of things you have built and why those were interesting. Was the thing poorly defined before you built it? Was it experimental? Did it have a big impact for the customer?
- An open source portfolio can really help — it means people can concretely see what you’re capable of instead of just playing 20 questions with you. If you don’t have one, don’t start new projects — go find an existing project to contribute to. It is much more effective.