Setting up VXLAN on Google Compute Engine

So my ultimate goal here is to try out VXLAN between some VMs on instances in Google compute engine, but today I’m just going to get VXLAN working because that took a fair bit longer than I expected. First off, boot your instances — because I will need nested virt later I chose two instances on Google Cloud. Please note that you need to do a bit of a dance to turn on nested virt there. I also chose to use Debian for this experiment:

gcloud compute instances create vx-1 --zone us-central1-b --min-cpu-platform "Intel Haswell" --image nested-vm-image

Now do those standard things you do to all new instances:

sudo apt-get update
sudo apt-get dist-upgrade -y

Now let’s setup VXLAN between the two nodes, with a big nod to this web page. First create a VXLAN interface on each machine (if you care about the port your VXLAN traffic is on being to IANA standards, see the postscript at the end of this):

sudo ip link add vxlan0 type vxlan id 42 dev eth0 dstport 0

Now we need to put the two nodes into a mesh, where 34.70.161.180 is the IP of the node we are not running this command on and the IP address for the second command needs to be different on each machine.

sudo bridge fdb append to 00:00:00:00:00:00 dst 34.70.161.180 dev vxlan0
sudo ip addr add 192.168.200.1/24 dev vxlan0
sudo ip link set up dev vxlan0

I am pretty sure that this style of mesh (all nodes connected) wouldn’t scale past non-trivial sizes, but hey baby steps right? Finally, because we’re using Google Cloud we need to add firewall rules to allow our traffic into the instances:

Note that these rules are a source of confusion for me right now. I wanted (and configured) VXLAN. So why do I need to allow OTV for this to work? I suspect Linux has politely ignored my request and used OTV not VXLAN for my traffic.

We should now be able to ping those newly configured IP addresses from each machine:

ping 192.168.200.2 -c 1
PING 192.168.200.2 (192.168.200.2) 56(84) bytes of data.
64 bytes from 192.168.200.2: icmp_seq=1 ttl=64 time=1.76 ms

Which produces traffic like this on the underlay network:

tcpdump -n -i eth0 host 34.70.161.180
tcpdump: verbose output suppressed, use -v or -vv for full protocol decode
listening on eth0, link-type EN10MB (Ethernet), capture size 262144 bytes
09:01:58.159092 IP 10.128.0.9.59341 > 34.70.161.180.8472: OTV, flags [I] (0x08), overlay 0, instance 42
IP 192.168.200.1 > 192.168.200.2: ICMP echo request, id 20119, seq 1, length 64
09:01:58.160786 IP 34.70.161.180.48471 > 10.128.0.9.8472: OTV, flags [I] (0x08), overlay 0, instance 42
IP 192.168.200.2 > 192.168.200.1: ICMP echo reply, id 20119, seq 1, length 64
^C
2 packets captured
2 packets received by filter
0 packets dropped by kernel

Hopefully this is helpful to someone else. Thanks again to Joe Julian for a very helpful post.

Postscript: Dale Shaw pointed out on twitter that I might still be talking VXLAN, just on a weird port. This is supported by this comment I found on the internets: “when VXLAN was first implemented in linux, UDP ports were not specified. Many vendors use 8472, and Linux uses the same port. Later, IANA allocated 4789 as the port. If you need to use the IANA port, you need to specify it with dstport”.

Playing with the python prometheus query API

The last few days have been a bit icky around here, with my house apparently proudly residing in the major city with the dirtiest air in the world. So, I needed a distraction…

It has also been quite hot, so I wondered how my energy usage was going. I have prometheus monitoring of my power draw, so now seemed as good a time as any to learn how to do some historical querying over the API. I ended up with a python script which can output things like this: Yesterday had a maximum temperature of 38 and we used 28.36 kwh. The average for similar days is 25.56 kwh.”

The code is on github if it is of interest to others. I am sure I could push more of this processing down into the prometheus engine, but I couldn’t see how to do it today. Hints welcome!

Coming to grips with Kubernetes in 2020: online training

There are a few online training resources I’ve had a play with while learning Kubernetes, so I figure that’s worth a quick write up. This is a follow on from my post about Kubernetes podcasts I’ve tried. I’ve tried three training providers so far:

  • The Linux Foundation Kubernetes course (LFS258 Kubernetes Fundamentals) is probably the “go to” resource for many people, and is often sold bundled with the certification exams. Unfortunately, it is really terrible. It is by far the worst course I’ve seen so far.
  • On the other hand, the Linux Academy Kubernetes course is really good. It is flaw is that you have to sign up to Linux Academy, which provides you with all you can eat courses for a rather steep annual fee.
  • Finally, I discovered Mumshad Mannambeth’s Udemy courses, and frankly they’re excellent. He’s put a huge amount of effort into them and it really shows. Even better, with Udemy’s regular sales you can pick up his three Kubernetes courses (intro, admin certification, and developer certification) for under $50 AUD. There are even plenty of online quizzes.

If I was going to pick a course to try, I’d definitely go with Mumshad.

Hacking on Arlec Christmas lights with tasmota

I’m loving the wide array of electrically certified home automation devices we’re seeing now. Light bulbs, sensors, power boards, and even Christmas lights. Specifically Arlec is shipping these app controllable Christmas lights this year, which looked very much like they should work with Tasmota.

(Sorry for the terrible product picture, I can’t find this product online any more, I suspect Bunnings has sold out for the year?)

Specifically, it turns out that these Arlec lights are an ESP8266 which can be flashed with tuya-convert v2 to run tasmota. Once flashed, you can control all of the functions available on the device itself, although there are parts of the protocol I haven’t fully understood yet.

Let’s start off by flashing the device:

  • First off boot your raspberry pi with tuya-convert. I used v2, and I suspect that’s important here so make sure you upgrade if you’re using something old.
  • Next, put the bud lights into programming mode by holding the button on the control box down until the light strand turns off. Release and the strand should start blinking every couple of seconds.
  • Now run the tuya-convert flashing script.
  • Now go to the tasmota-XXXX essid and enter your wifi details into the captive portal. The light strand will now reboot.
  • The light strand should now appear on your wifi, and you can find the IP address and MAC address by asking your DHCP server nicely.

Now for some basic configuration in the web UI. Set the module type to “Tuya MCU (54)”, GPIO1 to “Tuya Tx (107)”, and GPIO3 to “Tuya RX (108)”. You’ll also want to set the usual site-specific configuration options like your MQTT server and so forth.

So what is a “Tuya MCU” when it is at home? Well, it turns out that some tuya devices have an esp8266 which just talks serial to another microcontroller. It kind of makes sense if you already have a microcontroller device you want to make “smart” and you’re super dooper lazy I suppose. There is surprisingly good manufacturer documentation online.

In the case of these bud lights, I have a strong theory that we can basically cycle the modes available by pressing the physical button, but I needed a way to validate that.

You can read more about how these devices work on the tuya protocols page, or you can just jump ahead to the simple programs I wrote to explore these devices. However, a quick summary of the serial protocol spoken between the esp8266 and the MCU is helpful. Packets look like this:

  • Frame header: fixed 2 byte value 0x55aa
  • Version: 0x00
  • Command word: a byte
  • Data length: 2 bytes
  • Function length: 2 bytes
  • Function command: 1 byte
  • Checksum: 1 byte

First off I wrote a simple program to monitor the state of the device registers (called dpIds for “define product ids”). It just uses the tasmota web console to constantly ask for the current state of the dpIds (“SerialSend5 55aa0001000000”, a hard coded MCU control packet) and prints out any changes. Note that for it to work, the web console log level needs to be set to debug.

Now I can press the button on the device and see what dpIds change. A session looked like this (the notes like “solid white” are things I typed into the terminal as I went):

Clearly dpId 1 (type 1, a boolean) is the power state with 0 being off and 1 being on. This is also easily testable of course. If you send a “TuyaSend1 1,0” to the device using the web console it turns off, and if you send “TuyaSend1 1,1” it turns on. This assignment also maps to the way other Tuya MCU devices are configured, so it seems like a very safe assumption.

dpId 101 (type 4, 1 byte enum) seems to be the mode, walking through the possible values determined:

  • 0: fast pulse
  • 1: twinkle
  • 2: alternate
  • 3: alternate differently
  • 4: alternate and cause epileptic fits
  • 5: double alternate
  • 6: flash
  • 7: solid
  • 8: off
  • 9 onwards: brief all

dpId 107 (type 2, 4 byte value) wasn’t changable via the physical button, so I wrote another simple script to send a bunch of values to it. It appears to be a brightness control for the white LEDs, with 0 being off and 99 being fully on. I haven’t managed to find a brightness control for the coloured LEDs. The brightness control also doesn’t appear to work in all modes.

dpId108 (type 2, 4 byte value) remains a mystery to me at this time. It doesn’t seem to change regardless of what values I send it.

dpId 109 (type 4, 1 byte enum) seems to be which strand is on. A value of 0 is just white LEDs, 1 is just coloured LEDs, 2 is all LEDs, and 3 is all LEDs but dimmer.

It sort of doesn’t matter that I haven’t fully decoded the inner workings of the device, because this is enough information for my use case. All I really want is for all the lights to be on solidly (that is, with no blinking). This is because I use them for lighting under my back pergola and the blinking would be quite annoying.

So how do you wire up Home Assistant to send serial packets to a slave MCU over MQTT? Home Assistant will already control the lights turning on and off because of the default relay implementation for dpId 1. What I need to be able to do is turn on all the lights in a solidly on configuration. For that, I can implement rules like:

>> Rule1
ON Event#0 DO TuyaSend4 101,0 ENDON
ON Event#1 DO TuyaSend4 101,1 ENDON
ON Event#2 DO TuyaSend4 101,2 ENDON
ON Event#3 DO TuyaSend4 101,3 ENDON
ON Event#4 DO TuyaSend4 101,4 ENDON
ON Event#5 DO TuyaSend4 101,5 ENDON
ON Event#6 DO TuyaSend4 101,6 ENDON
ON Event#7 DO TuyaSend4 101,7 ENDON
ON Event#8 DO TuyaSend4 101,8 ENDON
ON Event#9 DO TuyaSend4 101,9 ENDON

>> Rule1 ON

>> Rule2
ON Power1#state=1 DO TuyaSend1 1,1 ENDON
ON Power1#state=1 DO TuyaSend4 109,2 ENDON

>> Rule2 ON

You apply this rule by pasting it into the web console on the device. Note that there are four separate pasted commands. Rule1 exposes the modes from dpId101 as effects in Home Assistant, and Rule2 hooks to the power on MQTT command to ensure that the lights are set to solidly on via dpId 109.

The matching Home Assistant configuration looks like this:

# Arlec fairy lights on the back deck
- platform: mqtt
  name: "Back deck 1"
  command_topic: "cmnd/sonoff14/POWER"
  state_topic: "tele/sonoff14/STATE"
  state_value_template: "{{value_json.POWER}}"
  availability_topic: "tele/sonoff14/LWT"
  effect_command_topic: "cmnd/sonoff14/Event"
  effect_list:
    - 0
    - 1
    - 2
    - 3
    - 4
    - 5
    - 6
    - 7
    - 8
    - 9
  payload_on: "ON"
  payload_off: "OFF"
  payload_available: "Online"
  payload_not_available: "Offline"
  qos: 1
  retain: false

This gives me the ability to turn the fairy lights on and off via Home Assistant, and ensure they they’re solidly on and not blinking. That’s good enough for now.

Further thoughts on Azure instance start times

My post from the other day about slow instance starts on Azure caused some commentary (mainly on reddit) that prompted me to think more about all this. In the end, there were a few more experiments I wanted to run to see if I could squeeze more performance out of Azure.

First off, looking at the logs from my initial testing it looks like resource groups are slow. The original terraform creates a resource group as part of the test and then cleans it up at the end. What if instead we had a single permanent resource group and created instances within that?

Here is a series of instance starts and deletes using the terraform from the last post:

You’ll notice that there’s no delete value for the last instance. That’s because terraform crashed and never deleted the instance. You can also see that instance starts are somewhat consistent, except for being slower in the second half of the test than the first, and occasionally spiking out to very very slow. Oh, and deletes are almost always really slow.

What happens if we use a permanent resource group and network? This means that all the “instance start terraform” is doing is creating a network interface and then an instance which uses that network interface. It has to be faster, but does it resolve our issues?

The dashed lines are the graph from above, the solid lines are the new data without resource group creation. You can see that abstracting away the resource group work has made a significant performance improvement. Instance start times are now generally under 100 seconds (which is still three times slower than AWS, and four or five times slower than Google).

So is it just that the Australian Azure zones are slow? I re-ran the new terraform against a US datacenter (East US). Here’s a zoom in of just the instance creates with the resource group extracted to make that clearer, for both data centers:

Interestingly, the Australian data center actually performs better than the US one, which isn’t what I would expect at all. You can also see in this test run that we do still see some unexpectedly slow instance launches, although they feel less frequent and smaller when they happen. That might also just be that I’m testing over a weekend and the data center might be more idle.

Looping back, I think we’ve learnt that resource groups are expensive. The last thing I wanted to dig into was what exactly was happening in those spikes where we had resource groups included. Luckily, they were happening about the point I started logging the terraform trace output of the run.

For example, run azure_1576926569_7_0_apply took 18 minutes and 3 seconds to create the instance. For those 18 minutes, terraform logs that the instance was marked by the Azure API as in provisioningState “Creating”. This correlates with operation id c983b272-fa32-4814-b858-adab3da4d9b1 sitting in state “InProgress”, unfortunately there isn’t a reason logged for why that is. So I guess its not possible as an Azure user to work out why things are sometimes slow.

To summarise some advice for terraform users on Azure — don’t create resource groups if you can avoid it. Create global resource groups and then place new objects into them instead. That said, you’re still going to have slower and less consistent performance than other clouds.

Finally, is instance start time a valid metric for cloud performance? Probably not. That said, it is table stakes to be in the conversation. Slow instance starts affect my overall experience of the cloud, as well as the workability of horizontal scaling techniques. This is especially true for instance start times which vary wildly like Azure’s do — I simply can’t trust that I can grow a horizontal scaling set with any sort of reasonable timeframe.

Why is Azure so slow to start instances?

I’ve been playing with terraform recently, and decided to see how different the terraform for launching a simple Ubuntu instance in various clouds is. There are two big questions there for me — how big is the variation between OpenStack derived clouds; and how painful is it to move between the proprietary clouds? Part of this is because terraform doesn’t present a standardised layer of cloud functionality, it has a provider per cloud.

(Although, I suspect there’s nothing stopping someone from writing a libcloud provider or something like that. It is an interesting idea which requires some additional thought.)

My terraform implementations for each cloud are on github if you’re interested. I don’t want to spend a lot of analysis on the actual terraform, because I think the really interesting thing I found isn’t where I expected it to be (there’s a hint in the title for this post). That said, the OpenStack clouds vary mostly by capabilities. vexxhost for example seems to only offer flavors that require boot-from-volume. The proprietary clouds are complete re-writes, but are generally relatively simple and well documented.

However, that interesting accidental thing — as best as I can tell, Microsoft Azure is really really slow to launch instances. The graph below presents five instance launches on each cloud I tested:

As you can see, Vault, Vexxhost, and AWS are basically all in the same ballpark. Google and Azure are outliers, with Google being crazy fast (but also very slow to delete instances, a metric not presented here), and Azure being more than three times slower than everyone else.

Instance launch time isn’t a great metric to be honest, but it does matter. For example if you were trying to autoscale a web tier or a kubernetes cluster, then waiting over two minutes just for the instance to boot before it can be configured and added to the cluster is probably not ok.

I wonder why Azure is so slow?

I did some further exploring after writing this post and was able to improve performance by changing how I handled resource groups in the terraform. The performance still isn’t great though. You can read more about that in a separate post if you’d like.

Coming to grips with Kubernetes in 2020: podcasts

It has become clear to me that it is time to care about Kubernetes more. I’m sure many people have cared for ages, but the things I want to build at the moment are starting to be more container based now that I am thinking more at the application layer than the cloud infrastructure layer. So how to do that? I thought I’d write down some notes on what has worked (or not) for me, in the hope it will help others. In this post, podcasts.

I thought podcasts would be an interesting way to get started with some nice overviews. This is especially true because I’m already a pretty heavy podcast user, so it was easy to slot into my existing routine. Unfortunately this hasn’t really worked out. I started with the podctl podcast, but they only ever talk about Red Hat stuff. It is very rare for a guest to not be a Red Hat employee for example. The presenters of this podcast seem to also really dislike OpenStack for reasons they never explain, which is annoying.

Then I figured maybe the Google Kubernetes podcast would be better, but it often lacks the depth I am interested in.

I am yet to find a good podcast which deep dives into technology instead of just talking about what is in the latest release. So maybe these podcasts are useful if you’re interested in what things dropped in the most recent release, but they’re not a good nor systematic way to get introduced to Kubernetes.

That said, I only just discovered the TGI Kubernetes youtube channel yesterday. It is not really what I wanted in a podcast given its a video blog, but I think it has prospects to be interesting. I will update this post when I’ve had a chance to check it out in more depth.

Have you found a good Kubernetes podcast? Am I being wildly unfair?

If I Understood You, Would I Have This Look on My Face?

This book discusses science and technical communication from the perspective of someone who comes from professional theatre and acting. Alan explains how his accidental discovery of the application of theatre sports to communication created an opportunity to teach technical communicators how to be more effective. Essentially, the argument is that empathy is essential to communication — you need to be able to understand where your audience is starting and and where they’re likely to get stuck before you can take them on the journey.

Unsurprisingly given the topic of the book, this is a well written and engaging read. The book is nicely structured and uses regular anecdotes (some of them humorous) to get its message across.

A detailed and fun read.

If I Understood You, Would I Have This Look on My Face? Book Cover If I Understood You, Would I Have This Look on My Face?
Alan Alda
Self-Help
Random House
June 6, 2017
240

NEW YORK TIMES BESTSELLER • Award-winning actor Alan Alda tells the fascinating story of his quest to learn how to communicate better, and to teach others to do the same. With his trademark humor and candor, he explores how to develop empathy as the key factor. “Invaluable.”—Deborah Tannen, #1 New York Times bestselling author of You’re the Only One I Can Tell and You Just Don’t Understand Alan Alda has been on a decades-long journey to discover new ways to help people communicate and relate to one another more effectively. If I Understood You, Would I Have This Look on My Face? is the warm, witty, and informative chronicle of how Alda found inspiration in everything from cutting-edge science to classic acting methods. His search began when he was host of PBS’s Scientific American Frontiers, where he interviewed thousands of scientists and developed a knack for helping them communicate complex ideas in ways a wide audience could understand—and Alda wondered if those techniques held a clue to better communication for the rest of us. In his wry and wise voice, Alda reflects on moments of miscommunication in his own life, when an absence of understanding resulted in problems both big and small. He guides us through his discoveries, showing how communication can be improved through learning to relate to the other person: listening with our eyes, looking for clues in another’s face, using the power of a compelling story, avoiding jargon, and reading another person so well that you become “in sync” with them, and know what they are thinking and feeling—especially when you’re talking about the hard stuff. Drawing on improvisation training, theater, and storytelling techniques from a life of acting, and with insights from recent scientific studies, Alda describes ways we can build empathy, nurture our innate mind-reading abilities, and improve the way we relate and talk with others. Exploring empathy-boosting games and exercises, If I Understood You is a funny, thought-provoking guide that can be used by all of us, in every aspect of our lives—with our friends, lovers, and families, with our doctors, in business settings, and beyond. “Alda uses his trademark humor and a well-honed ability to get to the point, to help us all learn how to leverage the better communicator inside each of us.”—Forbes “Alda, with his laudable curiosity, has learned something you and I can use right now.”—Charlie Rose

Prometheus 2.12, query logging, and startup failures on macos

Prometheus v2.12 added active query logging. The basic idea is that there is a mmaped JSON file that contains all of the queries currently running. If prometheus was to crash, that file would therefore be a list of the queries running at the time of the crash. Overall, not a bad idea.

Some friends had recently added prometheus to their development environments. This is wired up to grafana dashboards for their microservices, and prometheus is configured to store 14 days worth of time series data via a persistent volume from the developer desktops. We did this because it is valuable for the developers to be able to see the history of metrics before and after their changes.

Now we have a developer using macos as their primary development platform, and since prometheus 2.12 it hasn’t worked. Specifically this developer is using parallels to provide the docker virtual machine on his mac. You can summarise the startup for prometheus in the dev environment like this:

$ docker run ...stuff...
...snip...
level=error ts=2019-09-15T02:20:23.520Z caller=query_logger.go:94 component=activeQueryTracker msg="Failed to mmap" file=/prometheus-data/data/queries.active Attemptedsize=20001 err="invalid argument"
panic: Unable to create mmap-ed active query log

goroutine 1 [running]:
github.com/prometheus/prometheus/promql.NewActiveQueryTracker(0x7fff9917af38, 0x15, 0x14, 0x2a6b7c0, 0xc00003c7e0, 0x2a6b7c0)
	/app/promql/query_logger.go:112 +0x4d2
main.main()
	/app/cmd/prometheus/main.go:361 +0x52bd

And here’s the underlying problem — because of the way the persistent data is mapped into this container (via parallels sharing in this case), the mmap of the active queries file fails and prometheus fails to start.

In other words, since prometheus 2.12 your prometheus data files have to be stored on a filesystem which supports mmap. Additionally, there is no flag to just disable the active query logger.

So how do we work around this? Well, here’s a horrible workaround — in the data directory that is volume mapped into the container, create a symlink that is to a path that is mmapable inside the docker container, even if that path doesn’t exist outside the container. For example, given that we store the prometheus time series at $CONFIG/prometheus-data:

$ ln -s /tmp/queries.active "$CONFIG/prometheus-data/queries.active"

Note that /tmp/queries.active does not exist on the developer’s mac. Prometheus now starts and its puppies and kittens the whole way down.