Sunday, May 8, 2016

Root filesystem snapshots and kernel upgrades

On my laptop (which is running Arch), I decided to have periodic snapshots of the filesystem, in order to revert bad upgrades (especially those involving a large and unknown set of interdependent packages) easily. My toolset for this task is LVM2 and Snapper. Yes, I know that LVM2 is kind-of discouraged, and Snapper also supports btrfs, but most of the points below apply to btrfs, too.

Snapper, when used with LVM2, requires not just LVM2, but thinly-provisioned LVM2 volumes. Fortunately, Arch can have root filesystem on such volumes, so this is not a problem.

So, I have /boot on /dev/sda1, LVM on LUKS on /dev/sda2, root on a thinly-provisioned logical volume, and /home on another thinly-provisioned volume. And also swap on a non-thinly-provisioned volume. A separate /boot partition is needed because boot loaders generally don't understand thinly-provisioned LVM volumes, especially on encrypted disks. A separate volume for /home is needed because I don't want my new files in /home to be lost if I revert the system to its old snapshot. The same need to make a separate volume applies to other directories that contain data that should be preserved, but there are no such directories on my laptop. They can appear if I install e.g PostgreSQL.

And now there is a problem. Rollback to a snapshot works, but only if there were no kernel updates between the time when the snapshot was taken and when an attempt to revert was made. The root cause is that the kernel image is in /boot, and loadable modules for it are in /usr/lib/modules. The modules are reverted, but the boot loader still loads a new kernel, which now has no corresponding modules.

There are two solutions: either revert the kernel and its initramfs, too, when reverting the root file system, or make sure that modules are not reverted. I have not investigated how to make the first option possible, even though it would be a perfect solution. However, I have tried to make sure that modules are not reverted, and I am not satisfied with the result.

The idea was to move modules to /boot/modules, and make this location available somehow as /usr/lib/modules. Here "somehow" can mean either a symlink, or a bind mount. A symlink doesn't work, because the kernel upgrade in Arch will restore it back to a directory. A bind mount doesn't work, either. The issue is that, by putting modules on non-root filesystem, one creates a circular dependency between local filesystem mounting and udev (this would apply to a symlink, too).

Indeed, systemd-udevd, on startup, maps the /usr/lib/modules/`uname -r`/modules.alias.bin file into memory. So, now it has a (real) dependency on /usr/lib/modules being mounted. However, mounting local filesystems from /etc/fstab sometimes depends on systemd-udevd, because of device nodes. So, bind-mounting /usr/lib/modules merely from /etc/fstab, using built-in systemd tools, cannot work.

But it can work from a wrapper that starts before the real init:

#!/bin/sh
mount -n /boot              # /dev/sda1 is in devtmpfs and doesn't need udev
mount -n /usr/lib/modules   # there is still a line in fstab about that
exec /sbin/init "$@" 

But that's ugly. In the end, I removed the wrapper, installed an old known-working "linux" package, made a copy of the kernel, its initramfs and modules, upgraded the kernel again, and put the saved files back, so that they are now not controlled by the package manager. So now I have a known good kernel down in the boot menu, and knowledge that its modules will always be present in my root filesystem if I don't revert further than up to today's state.

And now one final remark. Remember that I said: "The same need to make a separate volume applies to other directories that contain data that should be preserved"? There is a temptation to apply this to the whole /var directory, but that would be wrong. If a system is being reverted to its old snapshot, a package database (which is in /var/lib/pacman) should be reverted, too. But /var/lib/pacman is under /var.

The conclusion is that Linux plumbers should think a bit about this "revert the whole system" use case, and maybe move some directories.

No comments: