Hacker News new | ask | show | jobs
by karteum 552 days ago
I was also thinking about extracting the MPQ on-disk e.g. "foo.mpq" -> ".foo.mpq.extracted/..." (which - to my understanding - would lead to a lot of deduplication even just with jdupes). N.B. depending on the requirements with regards to read performance, you could design a FUSE filesystem that would transparently reconstruct the MPQ on-the-fly.

Of course, I guess all of this would be much slower than being able to use inherent btrfs features as the author of the post wishes, but at least it would be independent from btrfs :)

1 comments

prying apart the MPQ file into it's parts and writing them down to disk is at the moment probably the strongest contender for my solution. it will cause the parts to be block aligned, be able to be hardlinked and cut down on metadata and improve performance (same inode when hardlinked). only thing i need in this case is to have a script to reassamble those extracted ones into the original MPQ archives, which have to match byte-by-byte to the original content ofc. extracting them into it's distinct parts also allows to access the contents directly, if so desired, without needing to extract them on demand (some people wanna look up assets in specific versions). these distinct parts can then additionally be deduplicated on block level as well.
I was curious about MPQ format and found this tool: https://github.com/eagleflo/mpyq

Maybe you can use it to decompress some files and assess how much disk space you can save using deduplication at filesystem level.

If it's worth the effort (1 mean, going from 1TiB to 100-200 Gib) I would consider coding the reassembly part. It can be done by a "script" fist, then can be "promoted" to a FUSE filesystem if needed.