Self-Supervised Learning for Videos

Y	Hacker News new \| ask \| show \| jobs

	Self-Supervised Learning for Videos (lightly.ai)
	91 points by sauravmaheshkar 673 days ago

5 comments

albert_e 672 days ago

my hypothesis:

using a video captured as 3D is going to vastly improve the learning of representations--with the additional benefit of depth perception that allows humans/neural nets to predict how their projections onto a 2D plane are supposed to look like as they move

say the videos are captured using a pair of identical cameras on a phone -- which I have been waiting for a while to see as a feature on flagship phones and mass adopted

such mass adoption would ensure there is vast amounts of traing data from all kinds of situations to learn everything about the visual world and its physics

now pair it with other sensors like audio, temperature, weather, chemicals, etc.

the model can learn to associate a boom with a flying jet, rumble with dark rolling clouds, and petrichor with rain on hot sand

we can slowly start to model more and more of human experience in a single model as computing power grows

link

isusmelj 671 days ago

I think that is similar to what Yann LeCun outlined: https://bdtechtalks.com/2022/03/07/yann-lecun-ai-self-superv...

link

optimalsolver 673 days ago

Rather than doing self-supervised learning on the actual video frames, why not do it on the byte sequence that represents the video file?

link

mkaic 673 days ago

You might find this paper interesting: [JPEG-LM: LLMs as Image Generators with Canonical Codec Representations](https://arxiv.org/abs/2408.08459)

link

optimalsolver 672 days ago

Thanks. This is exactly the kind of thing I was looking for.

link

byyoung3 673 days ago

Nice work!

link

joelio182 672 days ago

Very cool!

link

ljlolel 673 days ago

Cool!

link