In my previous post, I implemented
several tools to facilitate IP address allocation for ipvlan-based devices, using
DHCP for ipv4 and SLAAC for ipv6. The source code is in the tapalloc repository. I have since integrated
it into my GUIX system. Here is a sub-graph of
the services on my laptop, generated with herd graph
command and
filtering out nodes that don't link to one of the ontap
services:
Dependencies between services on my system. Edges are directed to the service depended on.
The iwd
service manages L2 connectivity for my wireless interface, keeping it
connected to whatever known wireless network I happen to be in, but
does not configure any IP addresses. The physical wlp3s0
interface has
no interfaces assigned, and most user processes use the host0
ipvlan
interface, which is configured by the networking
service which runs
a DHCP client.
The ontapd
service socket is at /run/ontap/port.wlp3s0 for the wlp3s0
wireless interface, and is accessible by users in the kvm
group. For
ephemeral, one-off VMs, I wrote a qemu wrapper, runvm
, that provisions
interfaces from it:
#!/usr/bin/env -S execlineb -s1
# Usage: runvm wlp3s0 <qemu-args ...>
ontap -t ipvtap -m 1 /run/ontap/port.${1} --verbosity=debug --
importas -ui HWADDR ONTAP_HWADDR
importas -ui FD0 ONTAP_FD0
qemu-system_x86_64
-net nic,model=virtio,macaddr=${HWADDR}
-net tap,id=net0,fd=${FD0}
$@
For long-running processes, I developed a service template,
ontap-exec-service-type
, which acquires an ipvlan (or ipvtap)
device, optionally in a new network namespace and/or uts namespace,
and executes a command. Here, for example, is a web server I create to
serve a draft version of my personal site from my workstation:
(service ontap-exec-service-type
(ontap-exec-configuration
(port "wlp3s0")
(type "ipvlan")
(user "www")
(caps "+net_bind_service") ;; binds to port 80
(name "arroyocc")
(exec #~(list #$(file-append darkhttpd "/bin/darkhttpd")
"/www" "--default-mimetype" "text/html"))))
This starts a web server
in its own network and hostname namespace, with its own public and
link-local IPv6 addresses (acquired by the Linux kernel). For the
purposes of an http server, the link-local address is not particularly
useful, since if you want to connect to an address in the link-local
fe80::/64
prefix, you must specify what interface to go over, usually
with a syntax such as
fe80::5c5f:6200:7fe:af20%eth0
which will work with curl but is intentionally unsupported by the likes of Chrome and Firefox (amusingly, Netsurf does).
IPv6 addresses are long, typing them in on a phone is tedious, and they are wont to change when my laptop restarts, or I restart the service. The final missing piece of the puzzle is getting DNS names for these services. To do that, I wrote another program, ontap-zone, that tracks addresses assigned to sub-interfaces and emits A/AAAA records in the format expected by the tinydns-data program. The records are derived from the device alias, if one exists, and have the following format:
${name}.wan.${domain} - All global IP address(es)
${name}.lan.${domain} - Local (ULA for ipv6, RFC1918) addresses
${name}.${domain} - lan or wan address based on client location
${name}.local - Link-local (fe80::/10, 169.254.0.0/16) addresses
This is fed into a tinydns process (actually, I am using shibari, a rewrite of tinydns that is more maintained and supports binding to ipv6 addresses). With that in place, I can resolve the addresses of the (aliased) links:
$ dns=2a02:a473:db:0:f9a0:ea48:aeab:96a2
$ for name in arroyocc{.wan,.lan,}.vanya
do
dig +noall +ans $name AAAA @$dns
done
arroyocc.wan.vanya. 86400 IN AAAA 2a02:a473:db:0:539b:fafb:34f3:4396
arroyocc.lan.vanya. 86400 IN AAAA fd4e:3732:3232:0:a8b6:cc46:610f:ef3b
arroyocc.vanya. 86400 IN AAAA 2a02:a473:db:0:539b:fafb:34f3:4396
I am using Linux's RFC 7217 implementation, which generates stable IPv6 addresses from some seed value. Using a hash of the name of the workload as the seed value, it is likely to receive the same IPv6 suffix across restarts. As a stop-gap, I've manually added the address of this DNS server to my DNS resolver (dnscache) configuration, but my goal is to delegate a subdomain to it, so it can be resolved from anywhere, and procure letsencrypt certificates. But I have to plan many details around security before I get that far.
Writing this program took a lot more experimentation with the Netlink API. I mentioned it in the previous post, but there is a netlink socket option, NETLINK_LISTEN_ALL_NSIDS, which allows a single netlink socket in a single namespace to receive notifications for all* network namespaces.
Only that's not quite true; in order to receive a notification for a
foreign namespace, the current namespace (where the netlink socket was
opened) must have already associated an ID with that namespace. You can
do that with the ip netns set
command, but that only lets you associate
an id with so-called "named" network namespaces -- namespaces which were
created with ip netns add
or otherwise bound to /var/run/netns. You can
use the RTM_NEWNSID rtnetlink request to allocate an ID for a namespace
with nothing more than a process ID.
So I modified the ontapd program, which allocates devices on behalf of connecting processes. Now, if it is allocating an ipvlan device for a peer, it will first ensure the peer's network namespace has an ID in the current namespace.
I also found that, from a single Netlink socket, you can actually request a dump of addresses and links in (mapped) foreign network namespaces by using the appropriate attributes (IFA_TARGET_NETNSID for addresses and IFLA_TARGET_NETNSID for links). However, you must use the NETLINK_GET_STRICT_CHK socket option, or your requests will be fulfilled in the network namespace of the socket. Lots of other little lessons are scattered throughout my git commit messages, and in code comments.
As a stop-gap measure until I can get a stable(-ish) address to delegate
a subdomain to, I wrote a program
that responds to multicast DNS requests by proxying them to an ordinary
(authoritative) DNS server. This allows me to resolve the address of an
interface as ${name}.local
, after setting up an mdns resolver:
$ w3m -dump http://arroyocc.local/tech/ocaml/ring | head -n 5
Arroyo.cc / tech / ocaml / A message-oriented pipe
A message-oriented pipe
I am working on a side project that involves sending and receiving messages on
Unfortunately, while this works from my computers (with avahi set up) and iOS devices, it does not work from my android device. Oh well.
While writing this I thought about the problem of keeping names available (or unavailable) as I move my laptop to other networks, or suspend it, or turn it off. I am thinking that mDNS could be a nice way for a host to distribute the names it's responsible for, and I could use an always-on machine such as my router or a mini-pc to collect those names and re-serve them from a well-known IP address, which I could delegate a public DNS domain to. To collect only the names I want to be public, I could only re-publish records that had a valid DNSSEC signature when they were learned.
If I go down this path I would integrate the mDNS responder directly into the ontap-zone program, eliminating the requirement for a backend DNS server. It would be easier to meet the mDNS RFC requirements of promptly announcing, invalidating, and defending names from collisions from a program that has actual visibility to netlink and namespace events as they occur.
I would also prefer if I could track the actual hostname used by the process(es) in a network namespace. Right now I interpret the alias given to an interface as its desired hostname. This was a compromise, because I couldn't figure out a way to track which UTS namespaces were related to which network interfaces without periodically scanning /proc, and I want a solution that can work with little to no polling.
I think this can be accomplished by inserting a bit of BPF code around the sethostname(2) system call that writes the current pid to a ring buffer that my program can then consume. That would allow me to keep a running inventory of related net and UTS namespaces. A network and UTS namespace would be related as long as there was at least one process that was a member of both of them. For each UTS namespace, I would define A/AAAA records for the IPv6 addresses of every address in every namespace related to it. After removing or combining duplicate names in some predictable manner, I would then serve it in a signed mDNS packet.
- See also
-
Build log: IP auto-config for ipvlan mode l2 devices
Feb 2025
Interfacing OCaml with netlink and C -
A message-oriented pipe
Feb 2025
Fun with ring buffers