|
|
|
|
|
by cesarb
912 days ago
|
|
IMO, part of the issue is that something which used to be just a low-level optimization (don't store large sequences of zeros) became visible to userspace (SEEK_HOLE and friends). Quoting from this article: "This is allowed; its always safe to say there’s data where there’s a hole, because reading a hole area will always find “zeroes”, which is valid data." But I recall reading elsewhere a discussion about some userspace program which did depend on holes being present in the filesystem as actual holes (visible to SEEK_HOLE and so on) and not as runs of zeros. Combined with the holes being restricted to specific alignments and sizes, this means that the underlying "sequence of fixed-size blocks" implementation is leaking too much over the abstract "stream of bytes" representation we're more used to. Perhaps it might be time to rethink our filesystem abstractions? |
|
"treatment of on-disk segments as "what was written by programs" can cause areas of 0 to not be written by bmaptool copy":
https://github.com/intel/bmap-tools/issues/75
IMO, the issue here isn't filesystem or zfs behavior, it's that bmap-tool wants an extra "don't care bit" per block, which filesystems (traditionally) don't track, and programs interacting with filesystem don't expect to exist.
Some of the comments I've made in this issue describe options to make things better.
(FWIW: the original hn link discusses a different issue around seek hole/data, and the bmap-tool issue is backwards from the issue the parent posits: bmap-tool relies on explicit runs of zeros written not being holes, and particular behavior from programs writing data)