Hacker News new | ask | show | jobs
by tel 4080 days ago
The wiggle word here is "right", I suppose. It's easy to ascribe meanings to that word which are very difficult to use---my limited understanding of Philosophy makes me think that this is the realm of ideas like "qualia" and the like.

For a long time statisticians wrangled over this word in a reduced context. The "art" of statistics is to build a model of the world which is sufficiently detailed to capture interesting data but not so detailed to make it difficult to interpret as a human decision-maker. Statisticians usually solve this problem by building a lot of models, getting lucky, presenting things to people and seeing what sticks.

For a long time this lack of a notion of "rightness" was so powerful that it precluded advancement of the field in certain ways.

With the advent of computers we discovered a new, even more precise form of "right" however and this formed the bedrock of Machine Learning. The "right" ML is concerned with is predictive power. A model is "right" when it leads to a training and prediction algorithm which is "probably, approximately correct", e.g. you can feed real data in and end up with something useful (with a high degree of probability).

So with respect to computer vision we know that it is very difficult to build "efficient" algorithms, ones which work well while using a reasonable amount of training data. CV moved forward when it realized that there were representations of the visual field which led to better predictive power---these were originally generated by studying the visual center of human and animal brains, but more recently have been generated "naively" by computers.

So, there's a reasonably well-defined way that we can find the "right" representation of visual scenes: if we find one which ultimately is best-in-class of all representations for any choice of ML task then it's "right".

1 comments

I like this definition, it's almost equivalent to the one given below by me: if you have a good predictor you can compress the information well, but not optimally. But to compress optimally, you need more than an optimal (single outcome) predictor, you need a predictor that will output probabilities of various events close to the true probability.

So in some sense optimal compression gives the best you could hope, up to limitations of the probabilistic models, which is why I like this explanation.