|
|
|
|
|
by retrac
658 days ago
|
|
I can't really understand speech these days without the captions to go with it. But I encounter discrepancies with AI generated captions very often. As in, I heard something and from context I know I'm right and the AI is wrong. With Whisper and other deep learning based speech systems in particular - they can generate very plausible misinterpretations - sounds similar and is grammatically plausible - but not what was said. Of a kind that a person with semantic understanding of what's going on would not make. So I am a little leery of them for that reason. I rely on it every day for generating captioning to video and so on. I don't find any iteration I've tried reliable or comfortable for interactive use. |
|
I've been noticing this as well. It's becoming a common problem. Also, many times I've noticed that if I hadn't heard the speech being captioned and only had the captioning to go by, I would have had little chance of correctly understanding what was actually said.