| I’d take this bet. > With this offer, you can tune your algorithm to my data. One can generate a 1GB file or 10GB file. It is highly likely that there is some form of a repeatable pattern in there to shave off 50-100 bytes by sliding window search. Then the decompressor is essentially - at this index, expand this pattern. The compressed file excludes the range of pattern. One may not always get such a pattern, but on multiple tries it’s very feasible. Payout just needs one win in 50 tries. You could generate a 100GB file. The bigger the file the higher the chance of repeating pattern. The challenge is won if compressed_file + decompressor is less one byte than original file. One could have a self executing decompressor to save some file overhead bytes. |
One argument: if your method worked, then we could compress nearly any file. This violates the pigeonhole principle.
Another example: for a 4GB file you’d need roughly a 4 byte integer to specify where the repeat started and where the second was, for 8 bytes. Then you need a length byte. Now you need a repeated 10+ bytes sequence to make this compress anything. There are 256^10=2^80 possible such 10 bytes sequence sequences, and only ~2^32 of them in a 4 gb file. So the odds of a repeated are around 1 in 2^50.
Tweak methods and estimate as you wish. It fails.