Hacker News new | ask | show | jobs
by rleigh 3222 days ago
My schroot tool used for building Debian packages could panic a kernel in under five minutes reliably, when it was rapidly creating and destroying LVM snapshots in parallel (24 parallel jobs, with lifetimes ranging from seconds to hours, median a minute or so).

This was due to udev races in part (it likes to open and poke around with LVs in response to a trigger on creation, which races with deletion if it's very quick). I've seen undeletable LVs and snapshots, oopses and full lockups of the kernel with no panic. This stuff appears not to have been stress tested.

I switched to Btrfs snapshots which were more reliable but the rapid snapshot churn would unbalance it to read only state in just 18 hours or so. Overlays worked but with caveats. We ended up going back to unpacking tarballs for reliability. Currently writing ZFS snapshot support; should have done it years ago instead of bothering with Btrfs.

1 comments

In my work identity, we saw a similar problem in our testing, where blkid would cause undesired IO on fresh devices. Eventually, we disabled blkid scanning our device mapper devices upon state changes with a file /etc/udev/59-no-scanning-our-devices.rules containing: ENV{DM_NAME}=="ourdevice", OPTIONS:="nowatch"

Alternately, you could call 'udevadm settle' after device creation before doing anything else, which will let blkid get its desired IO done, I think.

Yes, we did something similar to disable the triggers. Unfortunately, while this resolved some issues such as being unable to delete LVs which were erroneously in use, it didn't resolve the oopses and kernel freezes which were presumably locking problems or similar inside the kernel.