Hacker News new | ask | show | jobs
by SeptiumMMX 502 days ago
Given that it's BWT, the difference should be the most prominent on codebases with huge amounts of mostly equivalent files. Most compression algorithms won't help if you get an exact duplicate of some block when it's past the compression window (and will be less efficient if near the end of the window).

But here's a practical trick: sort files by extension and then by name before putting them into an archive, and then use any conventional compression. It will very likely put the similar-looking files together, and save you space. Done that in practice, works like a charm.

2 comments

Handy tip for 7-Zip, the `-mqs` command line switch (just `qs` in the Parameters field of the GUI) does this for you. https://7-zip.opensource.jp/chm/cmdline/switches/method.htm#...
Ooh, that’s neat. How much improved do you get from this? Is it more single or double digit % diff?