| The wiggle word here is "right", I suppose. It's easy to ascribe meanings to that word which are very difficult to use---my limited understanding of Philosophy makes me think that this is the realm of ideas like "qualia" and the like. For a long time statisticians wrangled over this word in a reduced context. The "art" of statistics is to build a model of the world which is sufficiently detailed to capture interesting data but not so detailed to make it difficult to interpret as a human decision-maker. Statisticians usually solve this problem by building a lot of models, getting lucky, presenting things to people and seeing what sticks. For a long time this lack of a notion of "rightness" was so powerful that it precluded advancement of the field in certain ways. With the advent of computers we discovered a new, even more precise form of "right" however and this formed the bedrock of Machine Learning. The "right" ML is concerned with is predictive power. A model is "right" when it leads to a training and prediction algorithm which is "probably, approximately correct", e.g. you can feed real data in and end up with something useful (with a high degree of probability). So with respect to computer vision we know that it is very difficult to build "efficient" algorithms, ones which work well while using a reasonable amount of training data. CV moved forward when it realized that there were representations of the visual field which led to better predictive power---these were originally generated by studying the visual center of human and animal brains, but more recently have been generated "naively" by computers. So, there's a reasonably well-defined way that we can find the "right" representation of visual scenes: if we find one which ultimately is best-in-class of all representations for any choice of ML task then it's "right". |
So in some sense optimal compression gives the best you could hope, up to limitations of the probabilistic models, which is why I like this explanation.