Hacker News new | ask | show | jobs
by darkmighty 4081 days ago
The question is fundamental to all kinds of recognition: recognizing the invariants of the scene, the data that distinguishes it from other scenes, which is very close to the definition of Shannon information.

For example, if you can extract a 'Mesh' from a 2D picture, you can generate many other view points, and that mesh can be considered a good representation. If you are more sophisticated however (and perhaps have a larger "dictionary"), you can instead extract 'There are two wooden chairs 1m from each other, ...'.

That's the sense in which the representation is fundamental to computer vision -- it distills what the system knows (or what it wants to know) about scenes. The more concise the representation without loss of information the smarter your system is (and past a point becomes a general AI problem).