| HN Mirror

Y	Hacker News new \| ask \| show \| jobs

by albert_e 672 days ago

my hypothesis:

using a video captured as 3D is going to vastly improve the learning of representations--with the additional benefit of depth perception that allows humans/neural nets to predict how their projections onto a 2D plane are supposed to look like as they move

say the videos are captured using a pair of identical cameras on a phone -- which I have been waiting for a while to see as a feature on flagship phones and mass adopted

such mass adoption would ensure there is vast amounts of traing data from all kinds of situations to learn everything about the visual world and its physics

now pair it with other sensors like audio, temperature, weather, chemicals, etc.

the model can learn to associate a boom with a flying jet, rumble with dark rolling clouds, and petrichor with rain on hot sand

we can slowly start to model more and more of human experience in a single model as computing power grows

1 comments

isusmelj 672 days ago

I think that is similar to what Yann LeCun outlined: https://bdtechtalks.com/2022/03/07/yann-lecun-ai-self-superv...

link