Hacker News new | ask | show | jobs
by lostinthefield 1865 days ago
"Really bad" is an exaggeration, I think. The auto-transcription features in both Google Meet and Zoom are more than acceptable, they're often very useful in catching missed words during a meeting.

They trip up on technical jargon but handle everyday conversations just fine, including speaker detection, punctuation, idioms, etc.

But that's also a slightly different use case, where each speaker is in their own (somewhat) quiet environment and on separate connections (and thus audio tracks).

It's much harder to do all that after the fact, like with a recorded video.

I find Trint.com, which is partially automatic, to be good for that... the AI does a first pass, and a human cleans it up afterward. YouTube has a similar assisted-auto feature for their captions, minus speaker separation.