Hacker News new | ask | show | jobs
by emodendroket 1034 days ago
But we don’t solely rely on how well we hear since we have knowledge that allows us to correct for poor hearing based on what is being said rather than forging ahead with a nonsense transcription. Machine transcription is definitely faster and cheaper but the end product isn’t “better,” and anyone who has read it can attest to that.
1 comments

> But we don’t solely rely on how well we hear since we have knowledge that allows us to correct for poor hearing based on what is being said rather than forging ahead with a nonsense transcription.

Good voice transcription AI already do that too; that's why they work best if they know which language they're operating in, as that means they can use the language to create a model of the most likely words.

I think the most recent WWDC from Apple even has a video about adding custom vocabulary for their speech engine to pick up on that covered some details in this exact topic, though I can't search right now.

Undoubtedly so but I have yet to see one that doesn't make mistakes a human would be unlikely to. It is not an easy capability to reproduce and wouldn't have been my first choice if I wanted to talk about things it can do better than people.
> Undoubtedly so but I have yet to see one that doesn't make mistakes a human would be unlikely to.

Absolutely, AI is very rarely human in its failure modes, and often has novel and exciting failure modes instead.

But, on average… or so the marketing claims… it makes fewer mistakes.

For a while, it was possible to improve upon super-human chess AI by pairing them with a human; the combination was called a centaur. Eventually the AI got too good even for that as they stopped making the sorts of mistakes humans could spot, but in the meantime, even though they were superhuman, they had failure modes that we could help out with.

Assuming the intended audience is also humans, "exciting" errors seem worse in this instance, so I find it hard to credit these marketing claims.
Seem worse, sure.

If it is or not, depends on the specific use of the transcription.

Consider "I went to Lenny's" being transcribed wrong by a human as "I went to Denny's" or by an AI as "Ivan to Lenny's".

Both are wrong, but if you get a human to check, we can be oblivious to the human mistake for the same reason it was made in the first place plus the effect where seeing text alters our perception of what we hear; the AI error being inhuman means we can spot it when the human error is imperceptible.