Hacker News new | ask | show | jobs
by albert_e 672 days ago
my hypothesis:

using a video captured as 3D is going to vastly improve the learning of representations--with the additional benefit of depth perception that allows humans/neural nets to predict how their projections onto a 2D plane are supposed to look like as they move

say the videos are captured using a pair of identical cameras on a phone -- which I have been waiting for a while to see as a feature on flagship phones and mass adopted

such mass adoption would ensure there is vast amounts of traing data from all kinds of situations to learn everything about the visual world and its physics

now pair it with other sensors like audio, temperature, weather, chemicals, etc.

the model can learn to associate a boom with a flying jet, rumble with dark rolling clouds, and petrichor with rain on hot sand

we can slowly start to model more and more of human experience in a single model as computing power grows

1 comments

I think that is similar to what Yann LeCun outlined: https://bdtechtalks.com/2022/03/07/yann-lecun-ai-self-superv...