|
|
|
|
|
by m3at
2494 days ago
|
|
We don't just have 2D data though. We learn objects representations by interacting with them over years in a multi modal fashion. Take for example a simple drinking glass: we know its material properties (it is transparent, solid, can hold liquids), its typical position (stay on a tabletop, upright with the open side on top), its usage (grab it with a hand and bring to mouth)... We also make heavy use of the time dimension, as over a few seconds we see the same objects from different view points and possibly in different states. Only after learning what a glass is can we easily recover its properties on a still 2D image. So at least for learning (might be skippable at inference), it makes a lot of sense to me to have more than 2D still images. |
|