| HN Mirror

Y	Hacker News new \| ask \| show \| jobs

by mhx77 2031 days ago

This is really cool, I'll give squashfs-tools-ng a try!

> Does the Raspbian image comparison compare XZ compression against SquashFS with Zstd?

That's correct. It's not an exhaustive matrix of comparisons.

> Also, is there any documentation on how the on-disk format for DwarFS and it's packing works which might explain the incredible size difference?

The format as of 0.2.0 is actually quite simple. It's a list of compressed data blocks, followed by a metadata block (and a schema describing the metadata block). The metadata format is implemented by and documented in in [1].

There are probably 3 things that contribute to compression level:

1) Block size. DwarFS can use arbitrary block sizes (artificially limited to powers of two), and uses a much larger block size (16M) by default. SquasFS doesn't seem to be able to go higher than 1M.

2) Ordering files by similarity.

3) Segment deduplication. If segments of files overlap with previously seen data, these segments are referenced instead of written again. The minimum size of these segments can be configured and defaults to 2k. For my primary use case, of the 47.6 GB of input data, 28.2 GB are saved by file-level deduplication, and another 12.4 GB by this segment-level deduplication. So before the "real" compression algorithms actually kick in, there are only 7 GB of data left. As these are ordered by similarity, and stored in rather big blocks, some of the 16M blocks can actually be compressed down to less then 100k.

[1] https://github.com/mhx/dwarfs/blob/main/thrift/metadata.thri...