|
|
|
|
|
by viraptor
1031 days ago
|
|
Does it actually not update in place even for areas with a single reference? I haven't checked the source, but that sounds like fragmentation hell on spinning disks. That would absolutely kill the performance on zfs-hosted VM images / databases, which I didn't think actually happens... (Apart from the intent log, which sure, that's append only) |
|
ZFS really deeply assumes that, when a region is in use, it will not change until it's no longer in use anywhere, and it also won't reuse things you just freed for a certain number of txgs afterward to let you get away with having to roll back a couple txgs in case of dire problems without excitement. (Since having enough writes will cause more txgs to happen faster, this isn't an issue people run into with being unable to use newly free space in practice.)
Also in practice, defining what "sequential" means with multiple disks in nontrivial topologies becomes...exciting anyway, and for writes, you only care that things are relatively, not absolutely, sequential for spinning media, and on reads, prefetch is going to notice you doing heavily sequential IO and queue things up anyway. (IMO)
If you like, you could go check on your configurations, what the DVAs for the different data blocks in your VM images are - something like zdb -dbdbdbdbdbdb [dataset] [object id, which you can get from the "inode number" of the file, or if it's a zvol, I think it's always just 1 that all the data you think of as the "disk" goes in...]
You'll almost certainly find that the regions that changed more than a couple txgs apart (the "birth=" value is the logical/physical txg the record was created) are mostly not remotely sequential.
(Nit - the two exceptions that come to mind are, the uberblocks are basically a fixed position on disk relative to the disk's size, and a fixed size, and you get [fixed size]/[minimum allocation size] of them in a ring buffer, basically, before you overwrite the oldest one, and that happens by just overwriting it, since it's technically not in use any more, someone just might want to roll back to it in a "This Should Never Happen(tm)" case...or the newly added feature of corrective send/recv, to let you feed ZFS a send stream of an "intact" copy of something that had an uncorrectable data error and have it scribble over the mangled copy with the fixed one in-place, assuming it passes the checksums.)