The Calculating Stars

Share

Winner of both a Hugo, Locus and a Nebula, this book is about a mathematical prodigy battling her way into a career as an astronaut in a post-apolocalyptic 1950s America. Along the way she has to take on the embedded sexism of America in the 50s, as well as her own mild racism. Worse, she suffers from an anxiety condition.

The book is engaging and well written, with an alternative history plot line which believable and interesting. In fact, its quite topical for our current time.

I really enjoyed this book and I will definitely be reading the sequel.

The Calculating Stars Book Cover The Calculating Stars
Mary Robinette Kowal
May 16, 2019
432

The Right Stuff meets Hidden Figures by way of The Martian. A world in crisis, the birth of space flight and a heroine for her time and ours; the acclaimed first novel in the Lady Astronaut series has something for everyone., On a cold spring night in 1952, a huge meteorite fell to earth and obliterated much of the east coast of the United States, including Washington D.C. The ensuing climate cataclysm will soon render the earth inhospitable for humanity, as the last such meteorite did for the dinosaurs. This looming threat calls for a radically accelerated effort to colonize space, and requires a much larger share of humanity to take part in the process. Elma York's experience as a WASP pilot and mathematician earns her a place in the International Aerospace Coalition's attempts to put man on the moon, as a calculator. But with so many skilled and experienced women pilots and scientists involved with the program, it doesn't take long before Elma begins to wonder why they can't go into space, too. Elma's drive to become the first Lady Astronaut is so strong that even the most dearly held conventions of society may not stand a chance against her.

Share

Configuring load balancing and location headers on Google Cloud

Share

I have a need at the moment to know where my users are in the world. This helps me to identify what compute resources to serve their request with in order to reduce the latency they experience. So how do you do that thing with Google Cloud?

The first step is to setup a series of test backends to send traffic to. I built three regions: Sydney; London; and Los Angeles. It turns out in hindsight that wasn’t actually nessesary though — this would work with a single backend just as well. For my backends I chose a minimal Ubuntu install, running this simple backend HTTP service.

I had some initial trouble finding a single page which walked through the setup of the Google Cloud load balancer to do what I wanted, which is the main reason for writing this post. The steps are:

Create your test instances and configure the backend on them. I ended up with a setup like this:

A list of google cloud VMs

Next setup instance groups to contain these instances. I chose unmanaged instance groups (that is, I don’t want autoscaling). You need to create one per region.

A list of google cloud instance groups

But wait! There’s one more layer of abstraction. We need a backend service. The configuration for these is cunningly hidden on the load balancing page, on a separate tab. Create a service which contains our three instance groups:

A sample backend service

I’ve also added a health check to my service, which just requests “/healthz” from each instance and expects a response of “OK” for healthy backends.

The backend service is also where we configure our extra headers. Click on the “advanced configurations” link, and more options appear:

Additional backend service options

Here I setup the extra HTTP headers the load balancer should insert: X-Region; X-City; and X-Lat-Lon.

And finally we can configure the load balancer. I selected a “HTTP(S) load balancer”, as I only care about incoming HTTP and HTTPS traffic. Obviously you set the load balancer to route traffic from the Internet to your VMs, and you wire the backend of the load balancer to your service. Select your backend service for the backend.

Now we can test! If I go to my load balancer in a web browser, I now get a result like this:

The top part of the page is just the HTTP headers from the request. You can see that we’re now getting helpful location headers. Mission accomplished!

Share

Interviewing hints (or, so you’ve been laid off…)

Share

This post is an attempt to collect a set of general hints and tips for resumes and interviews. It is not concrete truth though, like all things this process is subjective and will differ from place to place. It originally started as a Google doc shared around a previous workplace during some layoffs, but it seems more useful than that so I am publishing it publicly.

I’d welcome comments if you think it will help others.

So something bad happened

I have the distinction of having been through layoffs three times now. I think there are some important first steps:

  • Take a deep breath.
  • Hug your loved ones and then go and sweat on something — take a walk, go to the gym, whatever works for you. Research shows that exercise is a powerful mood stabiliser.
  • Make a plan. Who are you going to apply with? Who could refer you? What do you want to do employment wise? Updating your resume is probably a good first step in that plan.
  • Treat finding a job as your job. You probably can’t do it for eight hours a day, but it should be your primary goal for each “workday”. Have a todo list, track things on that list, and keep track of status.

And remember, being laid off isn’t about you, it is about things outside your control. Don’t take it as a reflection on your abilities.

Resumes

  • The goal of a resume is to get someone to want to interview you. It is not meant to be a complete description of everything you’ve done. So, keep it short and salesy (without lying through oversimplification!).
  • Resumes are also cultural — US firms tend to expect short summary (two pages), Australian firms seem to expect something longer and more detailed. So, ask your friends if you can see their resumes to get a sense of the right style for the market you’re operating in. It is possible you’ll end up with more than one version if you’re applying in two markets at once.
  • Speaking of friends, referrals are gold. Perhaps look through your LinkedIn and other social media and see where people you’ve formerly worked with are now. If you have a good reputation with someone and they’re somewhere cool, ask them to refer you for a job. It might not work, but it can’t hurt.
  • Ratings for skills on LinkedIn help recruiters find you. So perhaps rate your friends for things you think they’re good at and then ask them to return the favour?

Interviews in general

The soft interview questions we all get asked:

  • I would expect to be asked what I’ve done in my career — an “introduce yourself” moment. So try and have a coherent story that is short but interesting — “I’m a system admin who has been working on cloud orchestration and software defined networking for Australia’s largest telco” for example.
  • You will probably be asked why you’re looking for work too. I think there’s no shame in honesty here, something like “I worked for a small systems integrator that did amazing things, but the main customer has been doing large layoffs and stopped spending”.
  • You will also probably be asked why you want this job / want to work with this company. While everyone really knows it is because you enjoy having money, find other things beforehand to say instead. “I want to work with Amazon because I love cloud, Amazon is kicking arse in that space, and I hear you have great people I’d love to work with”.

Note here: the original version of the above point said “I’d love to learn from”, but it was mentioned on Facebook that the flow felt one way there. It has been tweaked to express a desire for a two way flow of learning.

“What have you done” questions: the reality is that almost all work is collaborative these days. So, have some stories about things you’ve personally done and are proud of, but also have some stories of delivering things bigger than one person could do. For example, perhaps the ansible scripts for your project were super cool and mostly you, but perhaps you should also describe how the overall project was important and wouldn’t have worked without your bits.

Silicon Valley interviews: organizations like Google, Facebook, et cetera want to be famous for having hard interviews. Google will deliberately probe until they find an area you don’t know about and then dig into that. Weirdly, they’re not doing that to be mean — they’re trying to gauge how you respond to new situations (and perhaps stress). So, be honest if you don’t know the answer, but then offer up an informed guess. For example, I used to ask people about system calls and strace. We’d keep going until we hit the limit of what they understood. I’d then stop and explain the next layer and then ask them to infer things — “assuming that things work like this, how would this probably work”? It is important to not panic!

Interviews as a sysadmin

  • Interviewers want to know about your attitude as well as your skills. As sysadmins, sometimes we are faced with high pressure situations  — something is down and we need to get it back up and running ASAP. Have a story ready to tell about a time something went wrong. You should demonstrate that you took the time to plan before acting, even in an emergency scenario. Don’t leave the interviewer thinking you’ll be the guy who will accidentally delete everyone’s data because you’re in a rush.
  • An understanding of how the business functions and why “IT” is important is needed. For example, if you get asked to explain what a firewall is, be sure to talk about how it relates to “security policy” as well as the technical elements (ports, packet inspection & whatnot).
  • Your ability to learn new technologies is as important as the technologies you already know.

Interviews as a developer

  • I think people look for curiosity here. Everyone will encounter new things, so they want to hear that you like learning, are a self starter, and can do new stuff. So for example if you’ve just done the CKA exam and passed that would be a great example.
  • You need to have examples of things you have built and why those were interesting. Was the thing poorly defined before you built it? Was it experimental? Did it have a big impact for the customer?
  • An open source portfolio can really help — it means people can concretely see what you’re capable of instead of just playing 20 questions with you. If you don’t have one, don’t start new projects — go find an existing project to contribute to. It is much more effective.
Share

Writing a terraform remote state server

Share

Terraform is a useful tool for deploying cloud resources. This post isn’t an introduction to terraform, so I’ll assume you already know and love it. If you want more, then this getting started guide would be a sensible start.

At its most basic level, terraform deploys cloud resources and stores information about those resources in a file on local disk called terraform.tfstate — it needs that state information so it can make later changes to the deployment, be those modifying resources in use or tearing the whole deployment down. If you had an operations team working on an environment, then you could store the tfstate file in git or a shared filesystem so that the entire team could manage the deployment. However, there is nothing with that approach that stops two members of the team making overlapping changes.

That’s where terraform state servers come in. State servers can implement optional locking, which stops overlapping operations from happening. The protocol that these servers talk isn’t well documented (that I could find), so I wanted to explore that. I wanted to explore that more, so I wrote a simple terraform HTTP state server in python.

To use this state server, configure your terraform file as per demo.tf. The important bits are:

terraform {
  backend "http" {
    address = "http://localhost:5000/terraform_state/4cdd0c76-d78b-11e9-9bea-db9cd8374f3a"
    lock_address = "http://localhost:5000/terraform_lock/4cdd0c76-d78b-11e9-9bea-db9cd8374f3a"
    lock_method = "PUT"
    unlock_address = "http://localhost:5000/terraform_lock/4cdd0c76-d78b-11e9-9bea-db9cd8374f3a"
    unlock_method = "DELETE"
  }
}

Where the URL to the state server will obviously change. The UUID in the URL (4cdd0c76-d78b-11e9-9bea-db9cd8374f3a in this case) is an example of an external ID you might use to correlate the terraform state with the system that requested it be built. It doesn’t have to be a UUID, it can be any string.

I am using PUT and DELETE for locks due to limitations in the HTTP verbs that the python flask framework exposes. You might be able to get away with the defaults in other languages or frameworks.

To run the python server, make a venv, install the dependancies, and then run:

$ python3 -m venv ~/virtualenvs/remote_state
$ . ~/virtualenvs/remote_state/bin/activate
$ pip install -U -r requirements.txt
$ python stateserver.py

I hope someone else finds this useful.

Share

Setting up VXLAN between nested virt VMs on Google Compute Engine

Share

I wanted to play with a VXLAN mesh between VMs on more than one hypervisor node, but the setup for VXLAN ended up being a separate post because it was a bit long. Read that post first if you want to follow the instructions here.

Now that we have a working VXLAN mesh between our two nodes we can move on to installing libvirt (which is called libvirt-daemon-system on Debian, not libvirt-bin as on Ubuntu):

sudo apt-get install -y qemu-kvm libvirt-daemon-system
sudo virsh net-start default
sudo virsh net-autostart --network default

I’m going to use a little python helper to launch my VMs, so I need some other dependancies as well:

sudo apt-get install -y python3-pip pkg-config libvirt-dev git

git clone https://github.com/mikalstill/shakenfist
cd shakenfist
git checkout 6bfac153d249752b27d224ad9d079095b640498e

sudo mkdir /srv/shakenfist
sudo cp template.debian.xml /srv/shakenfist/template.xml
sudo pip3 install -r requirements.txt

Let’s launch a quick test VM to make sure the helper works:

sudo python3 daemon.py
sudo virsh list

You can destroy that VM for now, it was just testing the install.

sudo virsh destroy ...name...

Next we need to tweak the template that shakenfist is using to start instances so that it uses the bridge for networking (that template is the one you copied to /srv/shakenfist/template.xml earlier). Replace the interface section in the template with this on both nodes:

<interface type='bridge'>
  <mac address={{eth0_mac}}/>
  <source bridge='br-vxlan0'/>
  <model type='virtio'/>
  <address type='pci' domain='0x0000' bus='0x00' slot='0x03' function='0x0'/>
</interface>

I know the bridge mentioned here doesn’t exist yet, but we’ll deal with that in a second. Before we start VMs though, we need a way of getting IP addresses to them. shakenfist can configure interfaces using config drive, but I’d prefer to use DHCP because who doesn’t love some additional complexity?

On one of the nodes install docker:


sudo apt-get install apt-transport-https ca-certificates curl gnupg2 software-properties-common
curl -fsSL https://download.docker.com/linux/debian/gpg | sudo apt-key add -
sudo add-apt-repository "deb [arch=amd64] https://download.docker.com/linux/debian $(lsb_release -cs) stable"
sudo apt-get update
sudo apt-get install docker-ce docker-ce-cli containerd.io

Now we can setup DHCP. Create a place for the configuration file:

sudo mkdir /srv/shakenfist/dhcp

And then create the configuration file at /srv/shakenfist/dhcp/dhcpd.conf with contents like this:

default-lease-time 3600;
max-lease-time 7200;
option domain-name-servers 8.8.8.8;
authoritative;

subnet 192.168.200.0 netmask 255.255.255.0 {
  option routers 192.168.1.1;
  option broadcast-address 192.168.1.255;

  pool {
    range 192.168.200.10 192.168.200.254;
  }
}

Before we can start dhcpd, we need to move the VXLAN device into a bridge so we can add a device for the DHCP server to it. First off remove the vxlan0 device from the last post:

sudo ip link set down dev vxlan0
sudo ip link del vxlan0

And now recreate it with a bridge:

sudo ip link add vxlan0 type vxlan id 42 dev eth0 dstport 0
sudo bridge fdb append to 00:00:00:00:00:00 dst 34.70.161.180 dev vxlan0
sudo ip link add br-vxlan0 type bridge
sudo ip link set vxlan0 master br-vxlan0
sudo ip link set vxlan0 up
sudo ip link set br-vxlan0 up
sudo ip link add dhcp-vxlan0 type veth peer name dhcp-vxlan0p
sudo ip link set dhcp-vxlan0p master br-vxlan0
sudo ip link set dhcp-vxlan0 up
sudo ip link set dhcp-vxlan0p up
sudo ip addr add 192.168.200.1/24 dev dhcp-vxlan0

This block of commands:

  • recreated the vxlan0 interface
  • added it to the mesh with the other node again
  • created a bridge named br-vxlan0
  • moved the vxlan0 interface into it
  • created a veth pair called dhcp-vxlan0 and dhcp-vlan0p
  • moved the peer part of that veth pair into the bridge
  • and then configured an IP on the external half of the veth pair

To make the bridge survive reboots you would need to add it to either /etc/network/interfaces or /etc/netplan/01-netcfg.yml depending on your distribution, but that’s outside the scope of this post.

You should be able to ping again. From the other node give it a try:

$ ping 192.168.200.1
PING 192.168.200.1 (192.168.200.1) 56(84) bytes of data.
64 bytes from 192.168.200.1: icmp_seq=1 ttl=64 time=19.3 ms
64 bytes from 192.168.200.1: icmp_seq=2 ttl=64 time=0.571 ms

We need to do something similar on the other node so it can run VMs as well. It is a tiny bit simpler because there wont be any DHCP there however, and remembering that you need to change 35.223.115.132 to the IP of your first node:

sudo ip link set down dev vxlan0
sudo ip link del vxlan0

sudo ip link add vxlan0 type vxlan id 42 dev eth0 dstport 0
sudo  bridge fdb append to 00:00:00:00:00:00 dst 35.223.115.132 dev vxlan0
sudo ip link add br-vxlan0 type bridge
sudo ip link set vxlan0 master br-vxlan0
sudo ip link set vxlan0 up
sudo ip link set br-vxlan0 up

Note that now we can’t do a ping test because the second VM no longer consumes an IP for the base OS.

Now we can start the docker container with dhcpd listening on dhcp-vxlan0:

sudo docker run -it --rm --init --net host -v /srv/shakenfist/dhcp:/data networkboot/dhcpd dhcp-vxlan0

This runs dhcpd interactively so we can see what happens. Now try starting a VM on the other node:

sudo python3 daemon.py

You can watch the VM booting using the “virsh console” command with the name of the vm from “virsh list“. The dhcpd process should show you something like this:

sudo docker run -it --rm --init --net host -v /srv/shakenfist/dhcp:/data networkboot/dhcpd dhcp-vxlan0
Internet Systems Consortium DHCP Server 4.3.5
Copyright 2004-2016 Internet Systems Consortium.
All rights reserved.
For info, please visit https://www.isc.org/software/dhcp/
Config file: /data/dhcpd.conf
Database file: /data/dhcpd.leases
PID file: /var/run/dhcpd.pid
Wrote 0 leases to leases file.
Listening on LPF/dhcp-vxlan0/06:ff:bc:7d:11:e3/192.168.200.0/24
Sending on   LPF/dhcp-vxlan0/06:ff:bc:7d:11:e3/192.168.200.0/24
Sending on   Socket/fallback/fallback-net
Server starting service.
DHCPDISCOVER from ee:95:4d:40:ca:a6 via dhcp-vxlan0
DHCPOFFER on 192.168.200.10 to ee:95:4d:40:ca:a6 (foo) via dhcp-vxlan0
DHCPREQUEST for 192.168.200.10 (192.168.200.1) from ee:95:4d:40:ca:a6 (foo) via dhcp-vxlan0
DHCPACK on 192.168.200.10 to ee:95:4d:40:ca:a6 (foo) via dhcp-vxlan0

You can see here that our new VM got the IP 192.168.200.10 from the DHCP server! It is moments like this when you don’t realise that this blog post took me hours to write that I feel really smart.

If we started a VM on the first node (the same command as for the second node), we’d now have two VMs on a virtual network which had working DHCP and could ping each other. I think that’s enough for one evening.

Share

Setting up VXLAN on Google Compute Engine

Share

So my ultimate goal here is to try out VXLAN between some VMs on instances in Google compute engine, but today I’m just going to get VXLAN working because that took a fair bit longer than I expected. First off, boot your instances — because I will need nested virt later I chose two instances on Google Cloud. Please note that you need to do a bit of a dance to turn on nested virt there. I also chose to use Debian for this experiment:

gcloud compute instances create vx-1 --zone us-central1-b --min-cpu-platform "Intel Haswell" --image nested-vm-image

Now do those standard things you do to all new instances:

sudo apt-get update
sudo apt-get dist-upgrade -y

Now let’s setup VXLAN between the two nodes, with a big nod to this web page. First create a VXLAN interface on each machine (if you care about the port your VXLAN traffic is on being to IANA standards, see the postscript at the end of this):

sudo ip link add vxlan0 type vxlan id 42 dev eth0 dstport 0

Now we need to put the two nodes into a mesh, where 34.70.161.180 is the IP of the node we are not running this command on and the IP address for the second command needs to be different on each machine.

sudo bridge fdb append to 00:00:00:00:00:00 dst 34.70.161.180 dev vxlan0
sudo ip addr add 192.168.200.1/24 dev vxlan0
sudo ip link set up dev vxlan0

I am pretty sure that this style of mesh (all nodes connected) wouldn’t scale past non-trivial sizes, but hey baby steps right? Finally, because we’re using Google Cloud we need to add firewall rules to allow our traffic into the instances:

Note that these rules are a source of confusion for me right now. I wanted (and configured) VXLAN. So why do I need to allow OTV for this to work? I suspect Linux has politely ignored my request and used OTV not VXLAN for my traffic.

We should now be able to ping those newly configured IP addresses from each machine:

ping 192.168.200.2 -c 1
PING 192.168.200.2 (192.168.200.2) 56(84) bytes of data.
64 bytes from 192.168.200.2: icmp_seq=1 ttl=64 time=1.76 ms

Which produces traffic like this on the underlay network:

tcpdump -n -i eth0 host 34.70.161.180
tcpdump: verbose output suppressed, use -v or -vv for full protocol decode
listening on eth0, link-type EN10MB (Ethernet), capture size 262144 bytes
09:01:58.159092 IP 10.128.0.9.59341 > 34.70.161.180.8472: OTV, flags [I] (0x08), overlay 0, instance 42
IP 192.168.200.1 > 192.168.200.2: ICMP echo request, id 20119, seq 1, length 64
09:01:58.160786 IP 34.70.161.180.48471 > 10.128.0.9.8472: OTV, flags [I] (0x08), overlay 0, instance 42
IP 192.168.200.2 > 192.168.200.1: ICMP echo reply, id 20119, seq 1, length 64
^C
2 packets captured
2 packets received by filter
0 packets dropped by kernel

Hopefully this is helpful to someone else. Thanks again to Joe Julian for a very helpful post.

Postscript: Dale Shaw pointed out on twitter that I might still be talking VXLAN, just on a weird port. This is supported by this comment I found on the internets: “when VXLAN was first implemented in linux, UDP ports were not specified. Many vendors use 8472, and Linux uses the same port. Later, IANA allocated 4789 as the port. If you need to use the IANA port, you need to specify it with dstport”.

Share

Playing with the python prometheus query API

Share

The last few days have been a bit icky around here, with my house apparently proudly residing in the major city with the dirtiest air in the world. So, I needed a distraction…

It has also been quite hot, so I wondered how my energy usage was going. I have prometheus monitoring of my power draw, so now seemed as good a time as any to learn how to do some historical querying over the API. I ended up with a python script which can output things like this: Yesterday had a maximum temperature of 38 and we used 28.36 kwh. The average for similar days is 25.56 kwh.”

The code is on github if it is of interest to others. I am sure I could push more of this processing down into the prometheus engine, but I couldn’t see how to do it today. Hints welcome!

Share

Coming to grips with Kubernetes in 2020: online training

Share

There are a few online training resources I’ve had a play with while learning Kubernetes, so I figure that’s worth a quick write up. This is a follow on from my post about Kubernetes podcasts I’ve tried. I’ve tried three training providers so far:

  • The Linux Foundation Kubernetes course (LFS258 Kubernetes Fundamentals) is probably the “go to” resource for many people, and is often sold bundled with the certification exams. Unfortunately, it is really terrible. It is by far the worst course I’ve seen so far.
  • On the other hand, the Linux Academy Kubernetes course is really good. It is flaw is that you have to sign up to Linux Academy, which provides you with all you can eat courses for a rather steep annual fee.
  • Finally, I discovered Mumshad Mannambeth’s Udemy courses, and frankly they’re excellent. He’s put a huge amount of effort into them and it really shows. Even better, with Udemy’s regular sales you can pick up his three Kubernetes courses (intro, admin certification, and developer certification) for under $50 AUD. There are even plenty of online quizzes.

If I was going to pick a course to try, I’d definitely go with Mumshad.

Share

Hacking on Arlec Christmas lights with tasmota

Share

I’m loving the wide array of electrically certified home automation devices we’re seeing now. Light bulbs, sensors, power boards, and even Christmas lights. Specifically Arlec is shipping these app controllable Christmas lights this year, which looked very much like they should work with Tasmota.

(Sorry for the terrible product picture, I can’t find this product online any more, I suspect Bunnings has sold out for the year?)

Specifically, it turns out that these Arlec lights are an ESP8266 which can be flashed with tuya-convert v2 to run tasmota. Once flashed, you can control all of the functions available on the device itself, although there are parts of the protocol I haven’t fully understood yet.

Let’s start off by flashing the device:

  • First off boot your raspberry pi with tuya-convert. I used v2, and I suspect that’s important here so make sure you upgrade if you’re using something old.
  • Next, put the bud lights into programming mode by holding the button on the control box down until the light strand turns off. Release and the strand should start blinking every couple of seconds.
  • Now run the tuya-convert flashing script.
  • Now go to the tasmota-XXXX essid and enter your wifi details into the captive portal. The light strand will now reboot.
  • The light strand should now appear on your wifi, and you can find the IP address and MAC address by asking your DHCP server nicely.

Now for some basic configuration in the web UI. Set the module type to “Tuya MCU (54)”, GPIO1 to “Tuya Tx (107)”, and GPIO3 to “Tuya RX (108)”. You’ll also want to set the usual site-specific configuration options like your MQTT server and so forth.

So what is a “Tuya MCU” when it is at home? Well, it turns out that some tuya devices have an esp8266 which just talks serial to another microcontroller. It kind of makes sense if you already have a microcontroller device you want to make “smart” and you’re super dooper lazy I suppose. There is surprisingly good manufacturer documentation online.

In the case of these bud lights, I have a strong theory that we can basically cycle the modes available by pressing the physical button, but I needed a way to validate that.

You can read more about how these devices work on the tuya protocols page, or you can just jump ahead to the simple programs I wrote to explore these devices. However, a quick summary of the serial protocol spoken between the esp8266 and the MCU is helpful. Packets look like this:

  • Frame header: fixed 2 byte value 0x55aa
  • Version: 0x00
  • Command word: a byte
  • Data length: 2 bytes
  • Function length: 2 bytes
  • Function command: 1 byte
  • Checksum: 1 byte

First off I wrote a simple program to monitor the state of the device registers (called dpIds for “define product ids”). It just uses the tasmota web console to constantly ask for the current state of the dpIds (“SerialSend5 55aa0001000000”, a hard coded MCU control packet) and prints out any changes. Note that for it to work, the web console log level needs to be set to debug.

Now I can press the button on the device and see what dpIds change. A session looked like this (the notes like “solid white” are things I typed into the terminal as I went):

Clearly dpId 1 (type 1, a boolean) is the power state with 0 being off and 1 being on. This is also easily testable of course. If you send a “TuyaSend1 1,0” to the device using the web console it turns off, and if you send “TuyaSend1 1,1” it turns on. This assignment also maps to the way other Tuya MCU devices are configured, so it seems like a very safe assumption.

dpId 101 (type 4, 1 byte enum) seems to be the mode, walking through the possible values determined:

  • 0: fast pulse
  • 1: twinkle
  • 2: alternate
  • 3: alternate differently
  • 4: alternate and cause epileptic fits
  • 5: double alternate
  • 6: flash
  • 7: solid
  • 8: off
  • 9 onwards: brief all

dpId 107 (type 2, 4 byte value) wasn’t changable via the physical button, so I wrote another simple script to send a bunch of values to it. It appears to be a brightness control for the white LEDs, with 0 being off and 99 being fully on. I haven’t managed to find a brightness control for the coloured LEDs. The brightness control also doesn’t appear to work in all modes.

dpId108 (type 2, 4 byte value) remains a mystery to me at this time. It doesn’t seem to change regardless of what values I send it.

dpId 109 (type 4, 1 byte enum) seems to be which strand is on. A value of 0 is just white LEDs, 1 is just coloured LEDs, 2 is all LEDs, and 3 is all LEDs but dimmer.

It sort of doesn’t matter that I haven’t fully decoded the inner workings of the device, because this is enough information for my use case. All I really want is for all the lights to be on solidly (that is, with no blinking). This is because I use them for lighting under my back pergola and the blinking would be quite annoying.

So how do you wire up Home Assistant to send serial packets to a slave MCU over MQTT? Home Assistant will already control the lights turning on and off because of the default relay implementation for dpId 1. What I need to be able to do is turn on all the lights in a solidly on configuration. For that, I can implement rules like:

>> Rule1
ON Event#0 DO TuyaSend4 101,0 ENDON
ON Event#1 DO TuyaSend4 101,1 ENDON
ON Event#2 DO TuyaSend4 101,2 ENDON
ON Event#3 DO TuyaSend4 101,3 ENDON
ON Event#4 DO TuyaSend4 101,4 ENDON
ON Event#5 DO TuyaSend4 101,5 ENDON
ON Event#6 DO TuyaSend4 101,6 ENDON
ON Event#7 DO TuyaSend4 101,7 ENDON
ON Event#8 DO TuyaSend4 101,8 ENDON
ON Event#9 DO TuyaSend4 101,9 ENDON

>> Rule1 ON

>> Rule2
ON Power1#state=1 DO TuyaSend1 1,1 ENDON
ON Power1#state=1 DO TuyaSend4 109,2 ENDON

>> Rule2 ON

You apply this rule by pasting it into the web console on the device. Note that there are four separate pasted commands. Rule1 exposes the modes from dpId101 as effects in Home Assistant, and Rule2 hooks to the power on MQTT command to ensure that the lights are set to solidly on via dpId 109.

The matching Home Assistant configuration looks like this:

# Arlec fairy lights on the back deck
- platform: mqtt
  name: "Back deck 1"
  command_topic: "cmnd/sonoff14/POWER"
  state_topic: "tele/sonoff14/STATE"
  state_value_template: "{{value_json.POWER}}"
  availability_topic: "tele/sonoff14/LWT"
  effect_command_topic: "cmnd/sonoff14/Event"
  effect_list:
    - 0
    - 1
    - 2
    - 3
    - 4
    - 5
    - 6
    - 7
    - 8
    - 9
  payload_on: "ON"
  payload_off: "OFF"
  payload_available: "Online"
  payload_not_available: "Offline"
  qos: 1
  retain: false

This gives me the ability to turn the fairy lights on and off via Home Assistant, and ensure they they’re solidly on and not blinking. That’s good enough for now.

Share

Further thoughts on Azure instance start times

Share

My post from the other day about slow instance starts on Azure caused some commentary (mainly on reddit) that prompted me to think more about all this. In the end, there were a few more experiments I wanted to run to see if I could squeeze more performance out of Azure.

First off, looking at the logs from my initial testing it looks like resource groups are slow. The original terraform creates a resource group as part of the test and then cleans it up at the end. What if instead we had a single permanent resource group and created instances within that?

Here is a series of instance starts and deletes using the terraform from the last post:

You’ll notice that there’s no delete value for the last instance. That’s because terraform crashed and never deleted the instance. You can also see that instance starts are somewhat consistent, except for being slower in the second half of the test than the first, and occasionally spiking out to very very slow. Oh, and deletes are almost always really slow.

What happens if we use a permanent resource group and network? This means that all the “instance start terraform” is doing is creating a network interface and then an instance which uses that network interface. It has to be faster, but does it resolve our issues?

The dashed lines are the graph from above, the solid lines are the new data without resource group creation. You can see that abstracting away the resource group work has made a significant performance improvement. Instance start times are now generally under 100 seconds (which is still three times slower than AWS, and four or five times slower than Google).

So is it just that the Australian Azure zones are slow? I re-ran the new terraform against a US datacenter (East US). Here’s a zoom in of just the instance creates with the resource group extracted to make that clearer, for both data centers:

Interestingly, the Australian data center actually performs better than the US one, which isn’t what I would expect at all. You can also see in this test run that we do still see some unexpectedly slow instance launches, although they feel less frequent and smaller when they happen. That might also just be that I’m testing over a weekend and the data center might be more idle.

Looping back, I think we’ve learnt that resource groups are expensive. The last thing I wanted to dig into was what exactly was happening in those spikes where we had resource groups included. Luckily, they were happening about the point I started logging the terraform trace output of the run.

For example, run azure_1576926569_7_0_apply took 18 minutes and 3 seconds to create the instance. For those 18 minutes, terraform logs that the instance was marked by the Azure API as in provisioningState “Creating”. This correlates with operation id c983b272-fa32-4814-b858-adab3da4d9b1 sitting in state “InProgress”, unfortunately there isn’t a reason logged for why that is. So I guess its not possible as an Azure user to work out why things are sometimes slow.

To summarise some advice for terraform users on Azure — don’t create resource groups if you can avoid it. Create global resource groups and then place new objects into them instead. That said, you’re still going to have slower and less consistent performance than other clouds.

Finally, is instance start time a valid metric for cloud performance? Probably not. That said, it is table stakes to be in the conversation. Slow instance starts affect my overall experience of the cloud, as well as the workability of horizontal scaling techniques. This is especially true for instance start times which vary wildly like Azure’s do — I simply can’t trust that I can grow a horizontal scaling set with any sort of reasonable timeframe.

Share