Hacker News new | ask | show | jobs
by extra88 2338 days ago
The word accuracy for auto-generated captions is highly variable, sometimes it's good enough, sometimes it's not. In addition, proper captions should identify speaker changes, significant non-verbal sound (e.g. [car honking]), and include punctuation, all things that most auto-generated caption services don't even attempt to do.
1 comments

Again, what’s the standard for good enough? And I just ran an interview through an AI service. It caught speaker changes. And did punctuation. As well as a person, no? I use human transcription personally. But it seemed plenty adequate as minimal transcription.
What are you looking for, a percentage? I can't give you one.

Think about what the point is; the point is to give hearing-impaired people the same experience as hearing people. If you quizzed some people who only had the audio and others who only read the captions, both groups should be able to answer questions at the same rate of success (caption readers may get more questions right, like naming a speaker).

Some video captions could have numerous errors but of a type that the reader can easily tell what was meant. Other videos might have highly accurate captions but one essential word was missed, changing the whole meaning.