Hacker News new | ask | show | jobs
by barefeg 950 days ago
Out of curiosity, how does it compare to YouTube’s own generated transcripts?
4 comments

I would say that overall they are much, much better than the auto generated ones from YouTube. If the speaker speaks incredibly clearly and slowly, without slang, etc, then the built in ones are good enough. But in a tougher situation, the biggest whisper model achieves near superhuman accuracy— way better.
I found the YouTube one to be really bad for my voice and the way I speak - Whisper does it perfectly (using the large dataset).

English is my second language, and I mumble.

While it seems YouTube's auto-generated are hit or miss, I wonder if feeding them through an LLM can fix the mistakes and still get the video's idea out of them
I've found that to be the case. I typically don't want a full transcript -- I want the materials list, or a summary, or a counterargument. I've found it is totally sufficient to just plop the transcript into an LLM and ask for my desired output. No need to clean of the transcript ahead of time.
Whisper is generally better than the one in youtube.