| As it says in that slide deck's first slide (after the title slide), second bullet, this particular device removal technique is to deal with an "oops" where one accidentally adds a storage vdev to an existing pool. The zpool command line utility tries hard to help you not shoot yourself in the foot, but "zpool add -f pool diskname" sometimes happens when "zpool add -f pool cache diskname" was meant. Everyone's done it once. Thinks of a system melting down because the l2arc has died, and you're trying to replace it in a hurry, and you fat-finger the attempt to get rid of the "-n" and end up getting rid of "log" instead. Without this device removal, that essentially dooms your pool -- there is no way to back out, and the best you can do is throw hardware at the pool (attach another device fast to mirror the single device vdev, then try to grow the vdev to something temporarily useful, where "temporarily" almost always means "as long as it takes to get everything properly backed up" with the goal being the destruction and re-creation of the pool (plus restoral from backups). With this device removal, you do not have to destroy your pool; you have simply leaked a small amount of space (possibly permanently) and will carry a seek penalty on some blocks (possibly permanently, but that's rarer) that get written to that vdev before the replacement. As noted further in the slide deck (and in Alex's blog entries), this only works for single device vdevs -- you cannot remove anything else, like a raidz vdev, and you have to detach devices from mirror vdevs before removal. Also, note the overheads: although you can remove a single-device vdev with a large amount of data on it, doing so is a wrecking ball to resources, particularly memory. You won't want to do something like: Before: mirror-0
disk0 2tb-used 3tb-disk-size
disk1 2tb-used 3tb-disk-size
mirror-1
disk2 2tb-used 3tb-disk-size
disk3 2tb-used 3tb-disk-size do an expand dance, so you have mirror-0
disk0 2tb-used 6tb-disk-size
disk1 2tb-used 6tb-disk-size
mirror-1
disk2 2tb-used 3tb-disk-size
disk3 2tb-used 3tb-disk-size then detach disk3, then device-removal remove disk2, except in extremely special circumstances, and where you are well aware of the time it will take, the danger to the unsafe data in the pool during the removal (i.e., everything in former mirror-1), that your pool will be trashed beyond hope in the presence of crashes or errors during the removal, and that you will have a permanent expensive overhead in the pool after the removal is done. It would almost certainly be much faster and vastly safer to make a new pool with the 6tb disks and zfs send data from the old one to the new one. |
Certainly, it would be much less exciting to send|recv from poolA to poolB, and require no code changes and no GB per TB of data indirection overhead.
But this was intended as an example of how many caveats and problems are involved in even a "simple" feature involving shuffling data on-disk, and thus, why "defrag" is a horrendously hard problem in this environment.