Hacker News new | ask | show | jobs
by darkmighty 4080 days ago
I like this definition, it's almost equivalent to the one given below by me: if you have a good predictor you can compress the information well, but not optimally. But to compress optimally, you need more than an optimal (single outcome) predictor, you need a predictor that will output probabilities of various events close to the true probability.

So in some sense optimal compression gives the best you could hope, up to limitations of the probabilistic models, which is why I like this explanation.