Hacker News new | ask | show | jobs
by josefbacik 3684 days ago
There are no docs. I'm not a qcow2 expert, what I know is very basic so anything I say about qcow2 can be very wrong.

So qcow has a read only base image that gets updated when we change things. The image format just had the changes from the original image. So you update a package, it adds some metadata to point at the new stuff and adds the data in and you are done.

So with btrfs you have this image on top of btrfs, so you update a file and its metadata inside the image. Say you start with a pristine image that's in nice big extents. You update a package which changes small chunks all over the file. Let's say you update 12 4K extents. So now instead of one extent you now have 36 extents. This affects everything, fsyncs take longer because there's more extents we have to write out, the space is more fragmented so cold cache reads are more expensive, the csums are no longer contiguous so they also take up a larger more fragmented area. It has this really terrible cascading effect.

2 comments

That is a problem with CoW in general. ZFS uses a variable level indirect block tree. That puts an upper bound on fragmentation.
I tested it with preallocated raw images on Btrfs and that setup was slow as well. If I get you right it's must be possible to somehow allign VM disk sectors to Btrfs sectors for performance boost. Or compression and deduplication usage going to kill performance anyway?
Btrfs internally use logical addresses for all extents. The making from logical to physical is done via the chunk tree which not only indicates physical sector but also the device. So the reference for a file extent says nothing about what device the extent is on or the replication (raid profile) since that is all a function of the chunk and dev trees.

I think Btrfs for a guest F's is best pointed to an LV, rather than qcow2. It's been awhile since I benchmarked that compared to 'qemu-img create -f qcow2 -o nocow=on' which will set xattr +C on the file making it nocow. The nocow xattr helps a lot with this problem.

ZFS also does better when the guests are stored on zvols. If volblocksize=4K is set on creation, it can avoid read-modify-write overhead. Regular files can be used too, but those are somewhat higher overhead. In that case, recordsize=4K can be used.

That said, I do not think the tests involved nesting CoW file systems.