| HN Mirror

Y	Hacker News new \| ask \| show \| jobs

by johnmcd3 3165 days ago

> For many applications, making a transcription seems like an unnecessary step and source of errors. Skipping transcription when the user doesn't need it (most cases where I use it) would seem like a way to get some gain, but perhaps at the reduction of debuggability.

Agree. What's really needed is research into (and development supporting) how to combine the expertise from a speech recognition layer with the next layer in a machine learning process. That higher layer contains the domain specific knowledge needed for the problem at hand, and still leverage a speech layer focused on a broad speech data set and speech-specific learning (from Google, Microsoft, the community, etc.)

Today, how richly can information be shared? I see with Google's speech API you can only share a very finite list of domain-specific expected vocabulary.

Why not have speech tools at least output sets of possible translations with associated probabilities? Do any of the top tools allow this?

Then you could at least train your next level models with the knowledge of where ambiguity most exists, and what a couple of options might have been for certain words or phrases...