| HN Mirror

Y	Hacker News new \| ask \| show \| jobs


	by jakear 2114 days ago
	They could also be digging only into audio, doing speech recognition on it, then clustering the text. Augment that with the text users have put into the video directly using the in-app editor and you have some pretty solid data.

2 comments

ramimac 2114 days ago

If that were true, it'd be interesting to see if they push out support for close-captioning. It's an accessibility push, but also would leverage a lot of the same capabilities...

link

novok 2114 days ago

I would also start doing image recognition in the video frames, to extract things like gender, objects, etc.

link

thekyle 2114 days ago

Would this have any advantage over just using video embeddings (or a sequence of frame embeddings?) which in theory should capture those things in vectorized form.

link