Hacker News new | ask | show | jobs
by josefbacik 3680 days ago
NOCOW is horribly expensive because we still have to go check and make sure that there are no snapshots pointing at the changing extents. It only solves the fragmentation issue, and if you don't prealloc your image it doesn't even do that.
2 comments

Are you sure that disabling CoW solves btrfs' fragmentation issue when you pre-allocate the image? If the volume is snapshotted regularly, CoW should be active once on each extent once until the next snapshot reactivates it. That should mean that the fragmentation would still occur, although more slowly. Is that correct or am I misunderstanding something?

I am only superficially familiar with btrfs internals, but I do not see any way to implement snapshots without either doing CoW or duplicating the data in its entirety. If you are checking to see if the data is part of a snapshot, then you should be doing CoW.

Yeah nocow only works if there are no snapshots, which is why we have to check, so if you use snapshots it falls back to COW.
Disabling CoW disables snapshotting.
(I'm not an expert, just a user) Snapshotting effectively disables NOCOW. Snapshotting always works, and needs COW to do so. Which means ryao is right that fragmentation will occur. btrfs doesn't copy a NOCOW file in its entirety if it's in a snapshot, or I would have run out of disk space by now. (I really need to move my VM images to a non-backed-up subvolume.)
No it doesn't.
How about if you have a NOCOW subvolume with just a few files on it, e.g. 3 VM images. Is it still expensive to check for snapshots? Does the cost of checking disappear if you have no snapshots of the volume? As ryao mentioned, you'd still get fragmentation if you have snapshots, so in that case is using NOCOW for a VM image counterproductive?

Seems like the expense of checking could be largely removed with clever enough metadata caching, which probably noone has had time to implement.

We have to look up the physical extent in the extent reference tree, so the cost is independent of the number of snapshots and more a function of the fragmentation of the extent tree. The metadata is all cached of course, but fragmentation means you are likely to not find the entries in cache.

The other aspect that I haven't talked about is our fsync performance is kind of shit compared to other fs'es. Now this does get better in the nocow case but it's still pretty heavy and needs optimization.