Build log: IP auto-config for ipvlan mode l2 devices, part 2

In my previous post, I implemented several tools to facilitate IP address allocation for ipvlan-based devices, using DHCP for ipv4 and SLAAC for ipv6. The source code is in the tapalloc repository. I have since integrated it into my GUIX system. Here is a sub-graph of the services on my laptop, generated with herd graph command and filtering out nodes that don't link to one of the ontap services:

A graph of dependencies between services on my system, showing the relationship between the "iwd", "ontap", and "networking" services

Dependencies between services on my system. Edges are directed to the service depended on.

The iwd service manages L2 connectivity for my wireless interface, keeping it connected to whatever known wireless network I happen to be in, but does not configure any IP addresses. The physical wlp3s0 interface has no interfaces assigned, and most user processes use the host0 ipvlan interface, which is configured by the networking service which runs a DHCP client.

The ontapd service socket is at /run/ontap/port.wlp3s0 for the wlp3s0 wireless interface, and is accessible by users in the kvm group. For ephemeral, one-off VMs, I wrote a qemu wrapper, runvm, that provisions interfaces from it:

#!/usr/bin/env -S execlineb -s1
# Usage: runvm wlp3s0 <qemu-args ...>

ontap -t ipvtap -m 1 /run/ontap/port.${1} --verbosity=debug --
importas -ui HWADDR ONTAP_HWADDR
importas -ui FD0 ONTAP_FD0

qemu-system_x86_64
	-net nic,model=virtio,macaddr=${HWADDR}
	-net tap,id=net0,fd=${FD0}
	$@

For long-running processes, I developed a service template, ontap-exec-service-type, which acquires an ipvlan (or ipvtap) device, optionally in a new network namespace and/or uts namespace, and executes a command. Here, for example, is a web server I create to serve a draft version of my personal site from my workstation:

(service ontap-exec-service-type
  (ontap-exec-configuration
    (port "wlp3s0")
    (type "ipvlan")
    (user "www")
    (caps "+net_bind_service") ;; binds to port 80
    (name "arroyocc")
    (exec #~(list #$(file-append darkhttpd "/bin/darkhttpd")
                     "/www" "--default-mimetype" "text/html"))))

This starts a web server in its own network and hostname namespace, with its own public and link-local IPv6 addresses (acquired by the Linux kernel). For the purposes of an http server, the link-local address is not particularly useful, since if you want to connect to an address in the link-local fe80::/64 prefix, you must specify what interface to go over, usually with a syntax such as

fe80::5c5f:6200:7fe:af20%eth0

which will work with curl but is intentionally unsupported by the likes of Chrome and Firefox (amusingly, Netsurf does).

IPv6 addresses are long, typing them in on a phone is tedious, and they are wont to change when my laptop restarts, or I restart the service. The final missing piece of the puzzle is getting DNS names for these services. To do that, I wrote another program, ontap-zone, that tracks addresses assigned to sub-interfaces and emits A/AAAA records in the format expected by the tinydns-data program. The records are derived from the device alias, if one exists, and have the following format:

${name}.wan.${domain} - All global IP address(es)
${name}.lan.${domain} - Local (ULA for ipv6, RFC1918) addresses
${name}.${domain} - lan or wan address based on client location
${name}.local - Link-local (fe80::/10, 169.254.0.0/16) addresses

This is fed into a tinydns process (actually, I am using shibari, a rewrite of tinydns that is more maintained and supports binding to ipv6 addresses). With that in place, I can resolve the addresses of the (aliased) links:

$ dns=2a02:a473:db:0:f9a0:ea48:aeab:96a2
$ for name in arroyocc{.wan,.lan,}.vanya
  do
      dig +noall +ans $name AAAA @$dns
  done
arroyocc.wan.vanya.	86400	IN	AAAA	2a02:a473:db:0:539b:fafb:34f3:4396
arroyocc.lan.vanya.	86400	IN	AAAA	fd4e:3732:3232:0:a8b6:cc46:610f:ef3b
arroyocc.vanya.		86400	IN	AAAA	2a02:a473:db:0:539b:fafb:34f3:4396

I am using Linux's RFC 7217 implementation, which generates stable IPv6 addresses from some seed value. Using a hash of the name of the workload as the seed value, it is likely to receive the same IPv6 suffix across restarts. As a stop-gap, I've manually added the address of this DNS server to my DNS resolver (dnscache) configuration, but my goal is to delegate a subdomain to it, so it can be resolved from anywhere, and procure letsencrypt certificates. But I have to plan many details around security before I get that far.

Writing this program took a lot more experimentation with the Netlink API. I mentioned it in the previous post, but there is a netlink socket option, NETLINK_LISTEN_ALL_NSIDS, which allows a single netlink socket in a single namespace to receive notifications for all* network namespaces.

Only that's not quite true; in order to receive a notification for a foreign namespace, the current namespace (where the netlink socket was opened) must have already associated an ID with that namespace. You can do that with the ip netns set command, but that only lets you associate an id with so-called "named" network namespaces -- namespaces which were created with ip netns add or otherwise bound to /var/run/netns. You can use the RTM_NEWNSID rtnetlink request to allocate an ID for a namespace with nothing more than a process ID.

So I modified the ontapd program, which allocates devices on behalf of connecting processes. Now, if it is allocating an ipvlan device for a peer, it will first ensure the peer's network namespace has an ID in the current namespace.

I also found that, from a single Netlink socket, you can actually request a dump of addresses and links in (mapped) foreign network namespaces by using the appropriate attributes (IFA_TARGET_NETNSID for addresses and IFLA_TARGET_NETNSID for links). However, you must use the NETLINK_GET_STRICT_CHK socket option, or your requests will be fulfilled in the network namespace of the socket. Lots of other little lessons are scattered throughout my git commit messages, and in code comments.

As a stop-gap measure until I can get a stable(-ish) address to delegate a subdomain to, I wrote a program that responds to multicast DNS requests by proxying them to an ordinary (authoritative) DNS server. This allows me to resolve the address of an interface as ${name}.local, after setting up an mdns resolver:

$ w3m -dump  http://arroyocc.local/tech/ocaml/ring | head -n 5
Arroyo.cc / tech / ocaml / A message-oriented pipe

 A message-oriented pipe

I am working on a side project that involves sending and receiving messages on

Unfortunately, while this works from my computers (with avahi set up) and iOS devices, it does not work from my android device. Oh well.

While writing this I thought about the problem of keeping names available (or unavailable) as I move my laptop to other networks, or suspend it, or turn it off. I am thinking that mDNS could be a nice way for a host to distribute the names it's responsible for, and I could use an always-on machine such as my router or a mini-pc to collect those names and re-serve them from a well-known IP address, which I could delegate a public DNS domain to. To collect only the names I want to be public, I could only re-publish records that had a valid DNSSEC signature when they were learned.

If I go down this path I would integrate the mDNS responder directly into the ontap-zone program, eliminating the requirement for a backend DNS server. It would be easier to meet the mDNS RFC requirements of promptly announcing, invalidating, and defending names from collisions from a program that has actual visibility to netlink and namespace events as they occur.

I would also prefer if I could track the actual hostname used by the process(es) in a network namespace. Right now I interpret the alias given to an interface as its desired hostname. This was a compromise, because I couldn't figure out a way to track which UTS namespaces were related to which network interfaces without periodically scanning /proc, and I want a solution that can work with little to no polling.

I think this can be accomplished by inserting a bit of BPF code around the sethostname(2) system call that writes the current pid to a ring buffer that my program can then consume. That would allow me to keep a running inventory of related net and UTS namespaces. A network and UTS namespace would be related as long as there was at least one process that was a member of both of them. For each UTS namespace, I would define A/AAAA records for the IPv6 addresses of every address in every namespace related to it. After removing or combining duplicate names in some predictable manner, I would then serve it in a signed mDNS packet.

See also
Build log: IP auto-config for ipvlan mode l2 devices Feb 2025
Interfacing OCaml with netlink and C
A message-oriented pipe Feb 2025
Fun with ring buffers