Setting up VXLAN between nested virt VMs on Google Compute Engine

Share

I wanted to play with a VXLAN mesh between VMs on more than one hypervisor node, but the setup for VXLAN ended up being a separate post because it was a bit long. Read that post first if you want to follow the instructions here.

Now that we have a working VXLAN mesh between our two nodes we can move on to installing libvirt (which is called libvirt-daemon-system on Debian, not libvirt-bin as on Ubuntu):

sudo apt-get install -y qemu-kvm libvirt-daemon-system
sudo virsh net-start default
sudo virsh net-autostart --network default

I’m going to use a little python helper to launch my VMs, so I need some other dependancies as well:

sudo apt-get install -y python3-pip pkg-config libvirt-dev git

git clone https://github.com/mikalstill/shakenfist
cd shakenfist
git checkout 6bfac153d249752b27d224ad9d079095b640498e

sudo mkdir /srv/shakenfist
sudo cp template.debian.xml /srv/shakenfist/template.xml
sudo pip3 install -r requirements.txt

Let’s launch a quick test VM to make sure the helper works:

sudo python3 daemon.py
sudo virsh list

You can destroy that VM for now, it was just testing the install.

sudo virsh destroy ...name...

Next we need to tweak the template that shakenfist is using to start instances so that it uses the bridge for networking (that template is the one you copied to /srv/shakenfist/template.xml earlier). Replace the interface section in the template with this on both nodes:

<interface type='bridge'>
  <mac address={{eth0_mac}}/>
  <source bridge='br-vxlan0'/>
  <model type='virtio'/>
  <address type='pci' domain='0x0000' bus='0x00' slot='0x03' function='0x0'/>
</interface>

I know the bridge mentioned here doesn’t exist yet, but we’ll deal with that in a second. Before we start VMs though, we need a way of getting IP addresses to them. shakenfist can configure interfaces using config drive, but I’d prefer to use DHCP because who doesn’t love some additional complexity?

On one of the nodes install docker:


sudo apt-get install apt-transport-https ca-certificates curl gnupg2 software-properties-common
curl -fsSL https://download.docker.com/linux/debian/gpg | sudo apt-key add -
sudo add-apt-repository "deb [arch=amd64] https://download.docker.com/linux/debian $(lsb_release -cs) stable"
sudo apt-get update
sudo apt-get install docker-ce docker-ce-cli containerd.io

Now we can setup DHCP. Create a place for the configuration file:

sudo mkdir /srv/shakenfist/dhcp

And then create the configuration file at /srv/shakenfist/dhcp/dhcpd.conf with contents like this:

default-lease-time 3600;
max-lease-time 7200;
option domain-name-servers 8.8.8.8;
authoritative;

subnet 192.168.200.0 netmask 255.255.255.0 {
  option routers 192.168.1.1;
  option broadcast-address 192.168.1.255;

  pool {
    range 192.168.200.10 192.168.200.254;
  }
}

Before we can start dhcpd, we need to move the VXLAN device into a bridge so we can add a device for the DHCP server to it. First off remove the vxlan0 device from the last post:

sudo ip link set down dev vxlan0
sudo ip link del vxlan0

And now recreate it with a bridge:

sudo ip link add vxlan0 type vxlan id 42 dev eth0 dstport 0
sudo bridge fdb append to 00:00:00:00:00:00 dst 34.70.161.180 dev vxlan0
sudo ip link add br-vxlan0 type bridge
sudo ip link set vxlan0 master br-vxlan0
sudo ip link set vxlan0 up
sudo ip link set br-vxlan0 up
sudo ip link add dhcp-vxlan0 type veth peer name dhcp-vxlan0p
sudo ip link set dhcp-vxlan0p master br-vxlan0
sudo ip link set dhcp-vxlan0 up
sudo ip link set dhcp-vxlan0p up
sudo ip addr add 192.168.200.1/24 dev dhcp-vxlan0

This block of commands:

  • recreated the vxlan0 interface
  • added it to the mesh with the other node again
  • created a bridge named br-vxlan0
  • moved the vxlan0 interface into it
  • created a veth pair called dhcp-vxlan0 and dhcp-vlan0p
  • moved the peer part of that veth pair into the bridge
  • and then configured an IP on the external half of the veth pair

To make the bridge survive reboots you would need to add it to either /etc/network/interfaces or /etc/netplan/01-netcfg.yml depending on your distribution, but that’s outside the scope of this post.

You should be able to ping again. From the other node give it a try:

$ ping 192.168.200.1
PING 192.168.200.1 (192.168.200.1) 56(84) bytes of data.
64 bytes from 192.168.200.1: icmp_seq=1 ttl=64 time=19.3 ms
64 bytes from 192.168.200.1: icmp_seq=2 ttl=64 time=0.571 ms

We need to do something similar on the other node so it can run VMs as well. It is a tiny bit simpler because there wont be any DHCP there however, and remembering that you need to change 35.223.115.132 to the IP of your first node:

sudo ip link set down dev vxlan0
sudo ip link del vxlan0

sudo ip link add vxlan0 type vxlan id 42 dev eth0 dstport 0
sudo  bridge fdb append to 00:00:00:00:00:00 dst 35.223.115.132 dev vxlan0
sudo ip link add br-vxlan0 type bridge
sudo ip link set vxlan0 master br-vxlan0
sudo ip link set vxlan0 up
sudo ip link set br-vxlan0 up

Note that now we can’t do a ping test because the second VM no longer consumes an IP for the base OS.

Now we can start the docker container with dhcpd listening on dhcp-vxlan0:

sudo docker run -it --rm --init --net host -v /srv/shakenfist/dhcp:/data networkboot/dhcpd dhcp-vxlan0

This runs dhcpd interactively so we can see what happens. Now try starting a VM on the other node:

sudo python3 daemon.py

You can watch the VM booting using the “virsh console” command with the name of the vm from “virsh list“. The dhcpd process should show you something like this:

sudo docker run -it --rm --init --net host -v /srv/shakenfist/dhcp:/data networkboot/dhcpd dhcp-vxlan0
Internet Systems Consortium DHCP Server 4.3.5
Copyright 2004-2016 Internet Systems Consortium.
All rights reserved.
For info, please visit https://www.isc.org/software/dhcp/
Config file: /data/dhcpd.conf
Database file: /data/dhcpd.leases
PID file: /var/run/dhcpd.pid
Wrote 0 leases to leases file.
Listening on LPF/dhcp-vxlan0/06:ff:bc:7d:11:e3/192.168.200.0/24
Sending on   LPF/dhcp-vxlan0/06:ff:bc:7d:11:e3/192.168.200.0/24
Sending on   Socket/fallback/fallback-net
Server starting service.
DHCPDISCOVER from ee:95:4d:40:ca:a6 via dhcp-vxlan0
DHCPOFFER on 192.168.200.10 to ee:95:4d:40:ca:a6 (foo) via dhcp-vxlan0
DHCPREQUEST for 192.168.200.10 (192.168.200.1) from ee:95:4d:40:ca:a6 (foo) via dhcp-vxlan0
DHCPACK on 192.168.200.10 to ee:95:4d:40:ca:a6 (foo) via dhcp-vxlan0

You can see here that our new VM got the IP 192.168.200.10 from the DHCP server! It is moments like this when you don’t realise that this blog post took me hours to write that I feel really smart.

If we started a VM on the first node (the same command as for the second node), we’d now have two VMs on a virtual network which had working DHCP and could ping each other. I think that’s enough for one evening.

Share

Setting up Cisco 7961 IP phones with asterisk

Share

This blog post is just my notes on the installation process I followed. There is lots of documentation out there, but a lot of it is contradictory or incomplete. These notes are mostly about the configuration in my house, and might not work for you. Sorry about that.

The first step is that you need to be running your own DHCP server. Running a simple embedded one in something like your DSL modem wont cut it, as you need to hand out non-standard options in your responses in order for the Cisco firmware on the phone to find the TFTP server you’ll set up in a bit. I’m not going to document installing DHCP here, as the Ubuntu packages are reasonable. In fact, the only annoying bit about the packages is that all the config et cetera is in a directory named /etc/dhcp, but for some reason I can’t explain the init script is /etc/init.d/isc-dhcp-server. That throws me every time.

You also need to know the MAC address of the phone. This is probably on a sticker on the bottom, failing that it is on the screen during the phone boot process. Absolute worst case, it is in the DHCP logs once the phone starts to boot. The DHCP config for my phones looks like this:

    option domain-name "home.stillhq.com";
    option domain-search "home.stillhq.com", "stillhq.com";
    option domain-name-servers 192.168.1.14;
    
    option routers 192.168.1.254;
    option broadcast-address 192.168.1.255;
    
    option ntp-servers 192.168.1.14;
    option smtp-server 192.168.1.14;
    option time-servers 192.168.1.14;
    
    default-lease-time 600;
    max-lease-time 7200;
    
    option cisco-etherboot-server code 150 = ip-address;
    
    ...
    
    # IP Phones
    group {
      option tftp-server-name "192.168.1.14";
      option cisco-etherboot-server 192.168.1.14;
      option arp-cache-timeout 600;
    
      host cisco-7961-1 {
        hardware ethernet 00:1a:a1:ca:04:5b;
        fixed-address 192.168.1.50;
        option host-name "cisco-7961-1";
      }
    }
    

I also added the phone to DNS with a reverse entry, but I don’t think that is actually required for the phone to work. Next I needed a TFTP server, which is something I haven’t bothered to run for years. I used HPA’s TFTP server, which again has reasonable-ish packages. One gotcha is that you need to install xinetd as well, and then disable the init script for the HPA TFTP server. As best as I could tell the default non-xinetd configuration simply didn’t work, so I don’t know why they package it like that.

Now for the really hard bit. You need to find the right firmware for the phone. I have my suspicions this is a lot easier for the modern Cisco phones, which have a web server by default and can be configured without TFTP. I say this as someone who doesn’t actually have one of these phones, but who has read some stuff online about them. These older phones are really TFTP happy, and seem to be constantly chatting to the TFTP server, even if they’re healthy. That might be an issue if you’re deploying thousands of these phones — you’d have to monitor TFTP server load and be aware of the extra IO during global phone firmware updates.

There are two ways to get the firmware for the phones. You can buy a support contract from Cisco for not very much money (around $20 a year), or you can find dodgy copies cached on the internet. If you choose to go the dodgy route, this this Whirlpool thread has some useful advice.

Next we need to do a factory reset on the phone. This might not be needed in absolutely all cases, but its just safer. To reset the phone, hold down the # key and power cycle the phone. The lights at the side of the screen will start flashing in sequence after a while (nearly a minute). You now press 123456789*0# within 60 seconds of releasing the # key you were holding down. Note as well that the Cisco documentation for what lights flash is wrong, but it didn’t seem to affect the outcome.

The phone is really slow to boot up (several minutes). Once it has booted, it grabs network configuration for DHCP as shown above, and then starts requesting files from the TFTP server. Here’s a log of all the requests from my phone booting when its happy:

    $ tail -f syslog | grep RRQ
    Nov 11 06:24:53 molokai in.tftpd[8221]: RRQ from 192.168.1.50 filename term61.default.loads
    Nov 11 06:24:54 molokai in.tftpd[8222]: RRQ from 192.168.1.50 filename Jar41sip.8-3-0-50.sbn
    Nov 11 06:24:57 molokai in.tftpd[8223]: RRQ from 192.168.1.50 filename cnu41.8-3-0-50.sbn
    Nov 11 06:25:00 molokai in.tftpd[8224]: RRQ from 192.168.1.50 filename apps41.8-3-0-50.sbn
    Nov 11 06:25:11 molokai in.tftpd[8235]: RRQ from 192.168.1.50 filename dsp41.8-3-0-50.sbn
    Nov 11 06:25:15 molokai in.tftpd[8236]: RRQ from 192.168.1.50 filename cvm41sip.8-3-0-50.sbn
    Nov 11 06:26:33 molokai in.tftpd[8242]: RRQ from 192.168.1.50 filename CTLSEP001AA1CA045B.tlv
    Nov 11 06:26:33 molokai in.tftpd[8243]: RRQ from 192.168.1.50 filename SEP001AA1CA045B.cnf.xml
    Nov 11 06:26:41 molokai in.tftpd[8244]: RRQ from 192.168.1.50 filename SIP41.8-3-1S.loads
    Nov 11 06:26:42 molokai in.tftpd[8245]: RRQ from 192.168.1.50 filename Jar41sip.8-3-0-50.sbn
    Nov 11 06:26:44 molokai in.tftpd[8246]: RRQ from 192.168.1.50 filename cnu41.8-3-0-50.sbn
    Nov 11 06:26:47 molokai in.tftpd[8247]: RRQ from 192.168.1.50 filename apps41.8-3-0-50.sbn
    Nov 11 06:26:59 molokai in.tftpd[8249]: RRQ from 192.168.1.50 filename dsp41.8-3-0-50.sbn
    Nov 11 06:27:02 molokai in.tftpd[8253]: RRQ from 192.168.1.50 filename cvm41sip.8-3-0-50.sbn
    Nov 11 06:27:59 molokai in.tftpd[8256]: RRQ from 192.168.1.50 filename CTLSEP001AA1CA045B.tlv
    Nov 11 06:27:59 molokai in.tftpd[8257]: RRQ from 192.168.1.50 filename SEP001AA1CA045B.cnf.xml
    Nov 11 06:28:14 molokai in.tftpd[8261]: RRQ from 192.168.1.50 filename /mk-sip.jar
    Nov 11 06:28:15 molokai in.tftpd[8262]: RRQ from 192.168.1.50 filename US/g3-tones.xml
    Nov 11 06:28:18 molokai in.tftpd[8263]: RRQ from 192.168.1.50 filename dialplan.xml
    

No, I don’t know why it requests those files at the start twice either, but it does it across multiple test factory resets. There are two files there which embed the MAC address of the phone into the filename, so you’ll have different names for those files in your setup. Note that the file CTLSEP001AA1CA045B doesn’t exist in my configuration, and that doesn’t seem to have caused anything bad to have happened. Filenames are also case sensitive, so that might make things more exciting for you. Almost all of the other files are firmware.

I recall creating a file named XMLDefault.cnf.xml which has a bunch of stuff in it, but I can’t see any evidence that it is used during the book process, so I think that might have been a dead end that I didn’t need to go down.

The format for SEP001AA1CA045B.cnf.xml is well documented in the links below, so I will leave that as an exercise for the reader. Feel free to ask questions in the comments to this post, and I’ll do my best to be helpful, bearing in mind that I am absolutely not an expert at this stuff.

Here’s a list of the web pages I thought were most helpful during my adventure:

Share