using a video captured as 3D is going to vastly improve the learning of representations--with the additional benefit of depth perception that allows humans/neural nets to predict how their projections onto a 2D plane are supposed to look like as they move
say the videos are captured using a pair of identical cameras on a phone -- which I have been waiting for a while to see as a feature on flagship phones and mass adopted
such mass adoption would ensure there is vast amounts of traing data from all kinds of situations to learn everything about the visual world and its physics
now pair it with other sensors like audio, temperature, weather, chemicals, etc.
the model can learn to associate a boom with a flying jet, rumble with dark rolling clouds, and petrichor with rain on hot sand
we can slowly start to model more and more of human experience in a single model as computing power grows
using a video captured as 3D is going to vastly improve the learning of representations--with the additional benefit of depth perception that allows humans/neural nets to predict how their projections onto a 2D plane are supposed to look like as they move
say the videos are captured using a pair of identical cameras on a phone -- which I have been waiting for a while to see as a feature on flagship phones and mass adopted
such mass adoption would ensure there is vast amounts of traing data from all kinds of situations to learn everything about the visual world and its physics
now pair it with other sensors like audio, temperature, weather, chemicals, etc.
the model can learn to associate a boom with a flying jet, rumble with dark rolling clouds, and petrichor with rain on hot sand
we can slowly start to model more and more of human experience in a single model as computing power grows