|
|
|
|
|
by darkmighty
4081 days ago
|
|
The question is fundamental to all kinds of recognition: recognizing the invariants of the scene, the data that distinguishes it from other scenes, which is very close to the definition of Shannon information. For example, if you can extract a 'Mesh' from a 2D picture, you can generate many other view points, and that mesh can be considered a good representation. If you are more sophisticated however (and perhaps have a larger "dictionary"), you can instead extract 'There are two wooden chairs 1m from each other, ...'. That's the sense in which the representation is fundamental to computer vision -- it distills what the system knows (or what it wants to know) about scenes. The more concise the representation without loss of information the smarter your system is (and past a point becomes a general AI problem). |
|