reykfloeter – blog

On hosting a hackathon, vmd, and the switch

Posted by
Reyk Floeter
on

If you ever hosted such an event or a party for many guests, you will know the dilemma of the host: youre constantly concerned about your guests enjoying it, you have to take care about many trivial things, other things will break, and you get little to no time to attend or even enjoy it yourself. Fortunately, I had very experienced and welcomed guests: only one vintage table and a vase broke the table can be fixed and I even found some time for hacking myself. I have to mention that this wouldnt have been possible without all the help by Mike (mikeb@), Bret (blambert@), Jan Schreiber, Malte Schalk and all the others who volunteered preparing, setting up, and tearing down the event. While giving credits: Nick Böse made the artwork and Timm Markgraf the pictures that you can find under http://k14.space/n2k15.

But lets get back to the social part. For the mid-hackathon social event, we visited the Christmas Market in Hannovers old town. I took the OpenBSD crowd, who came from all over Europe, USA, Canada, Japan, and Australia, to the traditional street market that is happening every year in December to feed them some Glühwein (hot mulled wine with an extra shot) and food. In the K14 itself I was busy giving quick introductions to our coffee machines, especially about using the portafilter. Coworking, Espresso, vintage stuff you got me.

For the technical part, I worked on two things: vmd(8) and the switch. Some time ago over a beer, when Mike Larkin (mlarkin@) mentioned his plans to implement a hypervisor for OpenBSD the first time, I got all excited and offered help on the userland and networking side. After he committed his initial implementation a few weeks ago, I literally jumped on vmd(8), the virtual machine daemon that is running the userland part of vmm(4)-controlled VMs. I sometimes have the privilege to work on new things in OpenBSD and with Mikes and Theos blanket endorsement, I could just move forward and implement our plans for vmd(8) with many subsequent commits.

The daemon vmd(8) is accompanied by a tool vmctl(8) to control and monitor the daemon on runtime it was previously called host err vmmctl. The daemon manages the virtual machines by running the VM processes in userland, controlling the virtual machine monitor vmm(4) in the kernel, handling the device I/O and VM exits from vmm(4), as well as configuring and setting up the VMs. For all the details about vmm(4) and VM-specific parts in userland, you better have to ask Mike Larkin, as he is the architect, deserves all the credits, and I only handle the infrastructure and configuration part.

Mikes initial vmd(8) already came with some built-in dropping of privileges, by having the privileged master process and unprivileged threaded child-processes per VM. I split the master process into three pieces: the privileged parent (or vmd) process that opens disks and devices, the unprivileged vmm process that talks to the kernel side of vmm(4), creates and monitors new VM processes, and the unprivileged control process that accepts connections from vmctl(8) on the control socket. All processes use pledge(2) to restrict the allowed system operations, and the unprivileged processes run as user _vmd and chrooted to /var/empty. The pledge(2) part is not quite true: the vmm and VM processes dont use pledge(2) yet, as they need the vmm-specific ioctls that arent allowed by any of the supported promises but I have a diff thatd allow to pledge stdio vmm. The daemon has to open disk images, the kernel and tap(4) network interfaces, but instead of doing it in the vmm master process directly, I moved this to the parent process that opens and passes up the file descriptors.

I added many new features to vmd(8) and vmctl(8), like a configuration file format vm.conf(5) that includes virtual machine specifications in a human-readable style that became very typical for OpenBSD. I initially implemented the configuration format in vmctl(8), but I took some time at n2k15 to move it to vmd(8) directly. The daemon now loads the optional configuration file on startup. The vmctl(8) tool will still allow to start new virtual machines on the command line, without loading a configuration file, but all the advanced options will go into vm.conf(5). In addition to some tweaking and cleaning, I added some groundwork that will be needed for items on the TODO list: start pre-configured VMs from the command line, run instances of configured VMs, track permissions and allow users to run their own VMs, change the interface configuration, and assign interfaces to switches.

The vmctl(8) tool got some tweaks and I changed the command line parser twice. Mikes initial tool was very basic and used a few getopt(3) arguments to start a VM. This was fine, but I saw the risk that it could turn into something like qemu --without-long-opts or any comparable tool that demands you to remember numerous letters and even getsubopt(3) CSV-like lists. I first changed it into a CLI-style format comparable to the one of bgpctl(8), our networking daemons, or even relayctl(8), but it wasnt very appreciated by getopt/POSIX-purists in our group. So I changed it again into a style that takes a keyword and argument followed by getopts, similar to Xens xl but without long options.

Evolution of vmctl(8):

# vmmctl -S -m 512 -n 1 -b /some/path/disk.img -k /some/path/bsd
# vmmctl start "myvm" memory 512M interfaces 1 disk disk.img kernel /bsd
# vmctl start "myvm" -m 512M -i 1 -d disk.img -k /bsd

Any complicated parts will be restricted to vm.conf(5), and the keyword namespace allows us to get around getsubopt(3). While I still think that the second version is more intuitive to use, I have to admit that the last version looks cleaner.

So what is this switch about? A few days before the hackathon, I talked with Masahiko Yasuoka (yasuoka@) and Kazuya Goda (goda@) about our bridge(4). With the MP network stack overhaul it became obvious that our bridge needs some updates and cleanup. It is some proven and reliable code that was written by Jason Wright (formerly jason@ the Wookie) almost 17 years ago. Much iteration and numerous improvements later, the bridge is at its core still based on the same code. Old is not bad, but it wasnt built for a MP networking stack and done before anyone talked about virtual switches, flow tables, or split data and control planes for such things. People were looking into supporting Open vSwitch (OVS), and Goda actually ported it, but the costs of adding the complex kernel layer of OVS to OpenBSD was just too high and with questionable licensing (the Apache 2 license is not acceptable for us). So we were reconsidering to further modernize the bridge(4). Godas OVS work helped to understand what we really need and I came up with a simple idea: we dont need it or another virtual switch, we just need a controller to offload the control plane. OpenBSD is already doing bridging, VXLANs, VLANs, STP, routing domains and many other things in the kernel, so why should we move it to yet another complex daemon? All we need is a controller daemon and a well-defined, pluggable interface to handle the forwarding decisions from bridge(4) in the daemon and the Cloud: OpenFlow.

Fortunately, I had started such a simple, privilege separated OpenFlow controller some time ago, but I never released it because it wasnt complete, not comparable to any of the big controllers, and I didn't have an actual use case in OpenBSD for it. It only provided a simple learning switch that works with Open vSwitch or OpenFlow-enabled HP (HPE) switches. I also didn't find a satisfying name for it, as OpenFlow is an open protocol but also a very strict trademark and calling it openflowd would violate their trademark policy (at least in the Land of the Free). I dont use funny or pet names for software, and OpenWolf or sdnflowd simply didn't work. After talking with Yasuoka and Goda I renamed it to switchd(8). Following the idea of using the OpenFlow protocol itself as our new kernel interface, Yasuoka and Goda worked on bridgeofp and managed to get it working as a simple layer 2 switch. This is a very brief summary, as the code hasnt been released yet, but watch out over the next few months what is coming. Well need it for many things, including the distributed virtual switching for vmd(8).

I did enjoy the hackathon! Thanks to everyone who attended, especially to Jonathan Matthew (jmatthew@) who cramped himself into airplanes all the way from Brisbane to visit Germanys most underrated city. And of course to all our users who support OpenBSD, the donations allowed the OpenBSD Foundation to cover hotel costs for a number of developers.

Permalink, Source, Tags: blogOpenBSDvmdswitchdn2k15hackathon