Linux bridges have their MTU overwritten when you add an interface

I discovered last night that network bridges on linux have their Maximum Transmission Unit (MTU) overwritten by whatever is the MTU value of the most recent interface added to the bridge. This is bad. Very bad. Specifically this is bad because MTU matters for accurately describing the capabilities of the network path the packets will travel on, so it shouldn’t be clobbered willy nilly.

Here’s an example of the behaviour:

# ip link add egr-br-ens1f0 mtu 1500 type bridge
# ip link show dev egr-br-ens1f0
3: egr-br-ens1f0: <BROADCAST,MULTICAST> mtu 1500 qdisc noop state DOWN mode DEFAULT group default qlen 1000
    link/ether 7e:33:1b:30:d8:00 brd ff:ff:ff:ff:ff:ff
# ip link add egr-eaa64a-o mtu 8950 type veth peer name egr-eaa64a-i
# ip link show dev egr-br-ens1f0
3: egr-br-ens1f0: <BROADCAST,MULTICAST> mtu 1500 qdisc noop state DOWN mode DEFAULT group default qlen 1000
    link/ether 7e:33:1b:30:d8:00 brd ff:ff:ff:ff:ff:ff
# brctl addif egr-br-ens1f0 egr-eaa64a-o
# ip link show dev egr-br-ens1f0
3: egr-br-ens1f0: <BROADCAST,MULTICAST> mtu 8950 qdisc noop state DOWN mode DEFAULT group default qlen 1000
    link/ether da:82:cf:34:13:60 brd ff:ff:ff:ff:ff:ff

So you can see here that the bridge had an MTU of 1,500 bytes. We create a veth pair with an MTU of 8,950 bytes and add it to the bridge. Suddenly the bridge’s MTU is 8,950 bytes!

Perhaps this is my fault — brctl is pretty old school. Let’s use only ip commands to configure the bridge.

# ip link add mgr-br-ens1f0 mtu 1500 type bridge
# ip link show dev mgr-br-ens1f0
6: mgr-br-ens1f0: <BROADCAST,MULTICAST> mtu 1500 qdisc noop state DOWN mode DEFAULT group default qlen 1000
    link/ether 82:d8:df:15:40:01 brd ff:ff:ff:ff:ff:ff
# ip link add mgr-eaa64a-o mtu 8950 type veth peer name mgr-eaa64a-i
# ip link show dev mgr-br-ens1f0
6: mgr-br-ens1f0: <BROADCAST,MULTICAST> mtu 1500 qdisc noop state DOWN mode DEFAULT group default qlen 1000
    link/ether 82:d8:df:15:40:01 brd ff:ff:ff:ff:ff:ff
# ip link set mgr-eaa64a-o master mgr-br-ens1f0
# ip link show dev mgr-br-ens1f0
6: mgr-br-ens1f0: <BROADCAST,MULTICAST> mtu 8950 qdisc noop state DOWN mode DEFAULT group default qlen 1000
    link/ether 22:55:4a:a8:19:00 brd ff:ff:ff:ff:ff:ff

The same problem occurs. Luckily, you can specify the MTU when you add an interface to a bridge, like this:

# ip link add zgr-br-ens1f0 mtu 1500 type bridge
# ip link show dev zgr-br-ens1f0
9: zgr-br-ens1f0: <BROADCAST,MULTICAST> mtu 1500 qdisc noop state DOWN mode DEFAULT group default qlen 1000
    link/ether 7a:54:2c:04:5f:a8 brd ff:ff:ff:ff:ff:ff
# ip link add zgr-eaa64a-o mtu 8950 type veth peer name zgr-eaa64a-i
# ip link show dev zgr-br-ens1f0
9: zgr-br-ens1f0: <BROADCAST,MULTICAST> mtu 1500 qdisc noop state DOWN mode DEFAULT group default qlen 1000
    link/ether 7a:54:2c:04:5f:a8 brd ff:ff:ff:ff:ff:ff
# ip link set zgr-eaa64a-o master zgr-br-ens1f0 mtu 1500
# ip link show dev zgr-br-ens1f0
9: zgr-br-ens1f0: <BROADCAST,MULTICAST> mtu 1500 qdisc noop state DOWN mode DEFAULT group default qlen 1000
    link/ether ae:59:0b:a6:46:94 brd ff:ff:ff:ff:ff:ff

And that works nicely. In my case, this ended up with me writing code to lookup the MTU of the bridge I was adding the interface to, and then specifying that MTU back when adding the interface. I hope this helps someone else.

Manipulating Docker images without Docker installed

Recently I’ve been playing a bit more with Docker images and Docker image repositories. I had in the past written a quick hack to let me extract files from a Docker image, but I wanted to do something a little more mature than that.

For example, sometimes you want to download an image from a Docker image repository without using Docker. Naively if you had Docker, you’d do something like this:

docker pull busybox
docker save busybox

However, that assumes that you have Docker installed on the machine downloading the images, and that’s sometimes not possible for security reasons. The most obvious example I can think of is airgapped secure environments where you need to walk the data between two networks, and the unclassified network machine doesn’t allow administrator access to install Docker.

So I wrote a little tool to do image manipulation for me. The tool is called Occy Strap, is written in python, and is available on pypi. That means installing it is relatively simple:

python3 -m venv ~/virtualenvs/occystrap
. ~/virtualenvs/occystrap/bin/activate
pip install occystrap

Which doesn’t require administrator permissions. There are then a few things we can do with Occy Strap.

Downloading an image from a repository and storing as a tarball

Let’s say we want to download an image from a repository and store it as a local tarball. This is a common thing to want to do in airgapped environments for example. You could do this with docker with a docker pull; docker save. The Occy Strap equivalent is:

occystrap fetch-to-tarfile registry-1.docker.io library/busybox \
    latest busybox.tar

In this example we’re pulling from the Docker Hub (registry-1.docker.io), and are downloading busybox’s latest version into a tarball named busybox-occy.tar. This tarball can be loaded with docker load -i busybox.tar on an airgapped Docker environment.

Downloading an image from a repository and storing as an extracted tarball

The format of the tarball in the previous example is two JSON configuration files and a series of image layers as tarballs inside the main tarball. You can write these elements to a directory instead of to a tarball if you’d like to inspect them. For example:

occystrap fetch-to-extracted registry-1.docker.io library/centos 7 \
    centos7

This example will pull from the Docker Hub the Centos image with the label “7”, and write the content to a directory in the current working directory called “centos7”. If you tarred centos7 like this, you’d end up with a tarball equivalent to what fetch-to-tarfile produces, which could therefore be loaded with docker load:

cd centos7; tar -cf ../centos7.tar *

Downloading an image from a repository and storing it in a merged directory

In scenarios where image layers are likely to be reused between images (for example many images which share a common base layer), you can save disk space by downloading images to a directory which contains more than one image. To make this work, you need to instruct Occy Strap to use unique names for the JSON elements within the image file:

occystrap fetch-to-extracted --use-unique-names registry-1.docker.io \ 
    homeassistant/home-assistant latest merged_images
occystrap fetch-to-extracted --use-unique-names registry-1.docker.io \ 
    homeassistant/home-assistant stable merged_images
occystrap fetch-to-extracted --use-unique-names registry-1.docker.io \ 
    homeassistant/home-assistant 2021.3.0.dev20210219 merged_images

Each of these images include 21 layers, but the merged_images directory at the time of writing this there are 25 unique layers in the directory. You end up with a layout like this:

0465ae924726adc52c0216e78eda5ce2a68c42bf688da3f540b16f541fd3018c
10556f40181a651a72148d6c643ac9b176501d4947190a8732ec48f2bf1ac4fb
...
catalog.json 
cd8d37c8075e8a0195ae12f1b5c96fe4e8fe378664fc8943f2748336a7d2f2f3 
d1862a2c28ec9e23d88c8703096d106e0fe89bc01eae4c461acde9519d97b062 
d1ac3982d662e038e06cc7e1136c6a84c295465c9f5fd382112a6d199c364d20.json 
... 
d81f69adf6d8aeddbaa1421cff10ba47869b19cdc721a2ebe16ede57679850f0.json 
...
manifest-homeassistant_home-assistant-2021.3.0.dev20210219.json 
manifest-homeassistant_home-assistant-latest.json manifest-
homeassistant_home-assistant-stable.json

catalog.json is an Occy Strap specific artefact which maps which layers are used by which image. Each of the manifest files for the various images have been converted to have a unique name instead of manifest.json as well.

To extract a single image from such a shared directory, use the recreate-image command:

occystrap recreate-image merged_images homeassistant/home-assistant \
    latest ha-latest.tar

Exploring the contents of layers and overwritten files

Similarly, if you’d like the layers to be expanded from their tarballs to the filesystem, you can pass the --expand argument to fetch-to-extracted to have them extracted. This will also create a filesystem at the name of the manifest which is the final state of the image (the layers applied sequential). For example:

occystrap fetch-to-extracted --expand quay.io \ 
    ukhomeofficedigital/centos-base latest ukhomeoffice-centos

Note that layers delete files from previous layers with files named “.wh.$previousfilename”. These files are not processed in the expanded layers, so that they are visible to the user. They are however processed in the merged layer named for the manifest file.

Complexity Arrangements for Sustained Innovation: Lessons From 3M Corporation

This is the second business paper I’ve read this week while reading along with my son’s university studies. The first is discussed here if you’re interested. This paper is better written, but more academic in its style. This ironically makes it harder to read, because its grammar style is more complicated and harder to parse.

The take aways for me from this paper is that 3M is good at encouraging serendipity and opportune moments that create innovation. This is similar to Google’s attempts to build internal peer networks and deliberate lack of structure. In 3M’s case its partially expressed as 15% time, which is similar to Google’s 20% time. Specifically, “eureka moments” cannot be planned or scheduled, but require prior engagement.

chance favors only the prepared mind — Pasteur

3M has a variety of methods for encouraging peer networks, including technology fairs, “bootlegging” (borrowing idle resources from other teams), innovation grants, and so on.

At the same time, 3M tries to keep at least a partial focus on events driving by schedules. The concept of time is important here — there is a “time to wait” (we are ahead of the market); “a time in between” (15% time); and “a time across” (several parallel efforts around related innovations to speed up the process).

The idea of “a time to wait” is quite interesting. 3M has a history of discovering things where there is no current application, but somehow corporately remembering those things so that when there are applications years later they can jump in with a solution. They embrace story telling as part of their corporate memory, as well as a way of ensuring they learn from past success and failure.

Finally, 3M is similar to Google in their deliberate flexibility with the rules. 15% time isn’t rigidly counted for example — it might be 15% a week, or 15% of a year, or more or less than that. As long as it can be justified as a good use of resources its ok.

This was a good read and I enjoyed it.

 

A corporate system for continuous innovation: The case of Google Inc

So, one of my kids is studying some business units at university and was assigned this paper to read. I thought it looked interesting, so I gave it a read as well.

While not being particularly well written in terms of style, this is an approachable introduction to the culture and values of Google and how they play into Google’s continued ability to innovate. The paper identifies seven important attributes of the company’s culture that promote innovation, as ranked by the interviewed employees:

  • The culture is innovation oriented.
  • They put a lot of effort into selecting individuals who will fit well with the culture at hiring time.
  • Leaders are seen as performing a facilitiation role, not a directive one.
  • The organizational structure is loosely defined.
  • OKRs and aligned performance incentives.
  • A culture of organizational learning through postmortems and building internal social networks. Learning is considered a peer to peer activity that is not heavily structured.
  • External interaction — especially in the form of aggressive acquisition of skills and technologies in areas Google feels they are struggling in.

Additionally, they identify eight habits of a good leader:

  • A good coach.
  • Empoyer your team and don’t micro-manage.
  • Express interest in employees’ success and well-being.
  • Be productive and results oriented.
  • Be a good communicator and listen to your team.
  • Help employees with career development.
  • Have a clear vision and strategy for the team.
  • Have key technical skills, so you can help advise the team.

Overall, this paper is well worth the time to read. I enjoyed it and found it insightful.

Shaken Fist v0.4.2

Shaken Fist v0.4.2 snuck out yesterday as part of shooting this tutorial video. That’s because I really wanted to demonstrate floating IPs, which I only recently got working nicely. Overall in v0.4.2 we:

  • Improved CI for image API calls.
  • Improved upgrade CI testing.
  • Improved network state tracking.
  • Floating IPs now work, and have covering CI. shakenfist#257
  • Resolve leaks of floating IPs from both direct use and NAT gateways. shakenfist#256
  • Resolve leaks of IPManagers on network delete. shakenfist#675
  • Use system packages for ansible during install.

Starting your first instance on Shaken Fist (a video tutorial)

As a bit of an experiment, I’ve made this quick and dirty “vlog” style tutorial video to show you how to install Shaken Fist on a single machine and boot your first instance. I demonstrate how to install, setup your first virtual network, start the instance, inspect events that the instance has experienced, and then log in.

Let me know if you think its useful.

Books read in January 2021

Its been 10 years since I’ve read enough to write one of these summary posts… Which I guess means something. This month I’ve been thinking a lot about systems design and how to avoid Second Systems effect while growing a product, which guided my reading choices a fair bit. A fair bit of that reading has been in the form of blog posts and twitter threads, so I am going to start including those in these listings of things I’ve read.

Social media posts of note:

Books:

Shaken Fist 0.4.1

I don’t blog about every Shaken Fist release here, but I do feel like the 0.4 release (and the subsequent minor bug fix release 0.4.1) are a pretty big deal in the life of the project.

Shaken Fist logo
We also got a cool logo during the v0.4 cycle as well.

The focus of the v0.4 series is reliability — we’ve used behaviour in the continuous integration pipeline as a proxy for that, but it should be a significant improvement in the real world as well. This has included:

  • much more extensive continuous integration coverage, including several new jobs.
  • checksumming image downloads, and retrying images where the checksum fails.
  • reworked locking.
  • etcd reliability improvements.
  • refactoring instances and networks to a new “non-volatile” object model where only immutable values are cached.
  • images now track a state much like instances and networks.
  • a reworked state model for instances, where its clearer why an instance ended up in an error state. This is documented in our developer docs.

In terms of new features, we also added:

  • a network ping API, which will emit ICMP ping packets on the network node onto your virtual network. We use this in testing to ensure instances booted and ended up online.
  • networks are now checked to ensure that they have a reasonable minimum size.
  • addition of a simple etcd backup and restore tool (sf-backup).
  • improved data upgrade of previous installations.
  • VXLAN ids are now randomized, and this has forced a new naming scheme for network interfaces and bridges.
  • we are smarter about what networks we restore on startup, and don’t restore dead networks.

We also now require python 3.8.

Overall, Shaken Fist v0.4 is a place that makes me much more comfortable to run workloads I care about on that previous releases. Its far from perfect, but we’re definitely moving in the right direction.

Goals Gone Wild

In 2009 Harvard Business School published a draft paper entitled “Goals Gone Wild“, and its abstract is quite concerning. For example:

“We identify specific side effects associated with goal setting, including a narrow focus that neglects non-goal areas, a rise in unethical behavior, distorted risk preferences, corrosion of organizational culture, and reduced intrinsic motivation.”

Are we doomed? Is all goal setting harmful? Interestingly, I came across this paper while reading Measure What Matters, which argues the exact opposite point — that OKRs provide a meaningful way to improve the productivity of an organization.

The paper starts by listing a series of examples of goal setting gone wrong: Sears’ auto repair in the early 1900s over charging customers to meet hourly billable goals; Enron’s sales targets based solely on volume and revenue and not profit; and Ford Motor Company’s goal of shipping a car at a specific target price point which resulted in significant safety failures.

The paper then provides specific examples of how goals can go wrong:

  • By being too specific and causing other important features of a task to be ignored — for example shipping on a specific deadline but ignoring testing adequately to achieve that deadline.
  • By being too common — employees with more than one goal tend to focus on one and ignore the others. For example studies have shown that if you present someone with both quality and quantity goals, that they will fixate on the quantity goals over the quality ones.
  • Inappropriate time horizon — for example, producing quarterly results by canibalizing longer term outcomes. Additionally, goals can be percieved as ceilings not floors, that is once a goal has been met attention is diverted elsewhere instead of over delivering on the goal.
  • By encouraging inappropriate risk taking or unethical behaviour — if a goal is too challenging, then an employee is encouraged to take risks they would not normally be able to justify in order to meet the goal.
  • Stretch goals that are not met hard employee’s confidence in their abilities and impact future performance.
  • A narrowly focused performance goal discourages learning and collaboration with coworkers. These tasks detract from time spent on the narrowly defined target, and are therefore de-emphasised.

The paper also calls out that while most people can see some amount of intrinsic motivation in their own behaviours, goals are extrinsic motivation and can be overused when applied to an intrinsicly motivated workforce.

Overall, the paper urges managers to consider if they goals they are setting are nessesary, and notes that goals should only be used in narrow circumstances.

A super simple sourdough loaf

This is the fourth in a series of posts documenting my adventures in making bread during the COVID-19 shutdown.

This post has been a while coming, but my sister in law was interested in the sourdough loaf last night, so I figured I should finally document my process. First off you need to have a sourdough starter, which I wrote up in a previous post. I am sure less cheaty ways will work too, but the cheating was where it was at for me.

Then, you basically follow the process I use for my super simple non-breadmaker loaf, but tweaked a little to use the starter. For the loaf itself:

  • 2 cups of bakers flour (not plain white flour)
  • 1 tea spoon of salt
  • 2 cups of the sourdough starter
  • 1 cup water

Similarly to the super simple loaf, you want the dough to be a bit tacky when mixed — it gets runnier as the yeast does its thing, so it will be too runny if it doesn’t start out tacky.

I then just leave it on the kitchen bench under a cover for the day. In the evening its baked like the super simple loaf — heat a high thermal mass dutch oven for 30 minutes at 230 degrees celcius, and then bake the break in the dutch over for first 30 minutes with the lid on, and then 12 more minutes with the lid off.

You also need to feed the starter when you make the loaf dough. That’s just 1.5 cups of flour, and a cup of warm water mixed into the starter after you’ve taken out the starter for the loaf. I tweak the flour to water ratio to keep the starter at a fairly thick consistency, and you’ll learn over time what is right. You basically want pancake batter consistency.

We keep our starter in the fridge and need to feed it (which means baking) twice a week. If we kept it on the bench we’d need to bake daily.