How much better would that get if you append all but one of the equal size documents? (or other combinations like 2 of the top results after using a single one)
Better, if the compressor can use all that extra context. Gzip, and most traditional general purpose compressors, can't.
It's hard to use distant context effectively. Even general purpose compression methods which theoretically can, often deliberately reset part of their context, since assuming a big file follows the same distribution throughout as in its beginning often hurts compression more than just starting over periodically.
It's hard to use distant context effectively. Even general purpose compression methods which theoretically can, often deliberately reset part of their context, since assuming a big file follows the same distribution throughout as in its beginning often hurts compression more than just starting over periodically.