| HN Mirror

Y	Hacker News new \| ask \| show \| jobs


	by indoordin0saur 687 days ago
	Doesn't even need to be user guided. Use videos that have audio. You could have one AI that generates a transcript using the audio/video and another that watches the video on mute and tries to read the lips. Feedback would then be provided by the AI that had access to the audio.

1 comments

0cf8612b2e1e 687 days ago

I am thinking of the millions of hours of tv news. Presenters are almost always going to be the same position in frame and may already have high quality transcripts.

link