|
|
|
|
|
by kanjus
2694 days ago
|
|
Great write-up! > It is actually a very reasonable assumption for almost all kinds of data -- given that suitable compression is applied. Data, which is well-compressed, is essentially uniformly random. What kinds of data are an exception? Your explanation seems to cover everything |
|
Second, you might (will) not be able to completely compress the data. A picture might be worth a thousand words, but they still take out a megabyte or so on disk. That makes for about 1000 bytes per word ;) So the entropy/information of a picture might be very small ("A dog jumping into water"), but we have no chance of truly understanding a general source (reality) and expressing its full machinery.
Think about the difference between JPEG and PNG (or GZIP and a JavaScript minifier). They are designed for completely different assumptions about the source and even receiver. JPEG assumes that the most important part of an image is the human-visual understand; PNG is lossless, but assumes high inter-pixel dependence. GZIP assumes general bytes (I think); JS minification assumes that a there is a more fundamental representation of the source without noise (formatting, comments, reasonable names, dead functions).