| > compressor comp.exe of size S1 that compresses enwik9 to archive.exe of size S2 such that S:=S1+S2 < L That criterion is rather more complicated than just taking the size S2 of archive.exe
There is no logical meaning to the sum of the size of a compressor and its output that I can see. Disregarding the compressor size would make this contest easier to understand as simply trying to determine the Kolmogorov Complexity (i.e. information content) of enwik9.
I looked for the motivation of including compressor size and only found this in the FAQ: > By just measuring L(D)+L(A), one can freely hand-craft large word tables (or other structures) used by C and D, and place them in C and either D or A. By counting both, L(C) and L(D), such tables become 2-3 times more expensive, and hence discourages them. Discouraging word tables seems like a weak and somewhat arbitrary justification for complicating the measure of merit. I don't think the contest would be any less interesting if the nature of the compression would be disregarded. One could even argue that compression of Wikipedia taking way more resources than decompression is justified by having to perform it only once, while its result could be decompressed millions of times. |
That would make the size of the compressed file meaningless.
Disregarding the time complexity of the compression is interesting, even ignoring the space complexity during compression or the size of the compressor.
But the size of the decompressor can't be ignored when trying to "measure" Kolgomorov complexity of the source data.