|
|
|
|
|
by tomasGiden
31 days ago
|
|
I’ve looked at confidence outputs for the chosen words from several STT providers and it’s definitely so that low confidence indicate that there is a risk that it has misheard. Not always though. Let’s say that someone is saying ”1 2 3 4 <unintelligible> 6 7 8” then it will happily write 5 in the middle and give it good confidence as based on the context, it is the only likely word. Varies between TTS providers though. Basically, why they are so good in average is that they estimate what is said most often based on the context. The context being then not only the audio but what was transcribed previously. And if you don’t want it to be based on what is most likely to be said in context and only based on the audio around 1 word it is going to be awfully wrong most of the time. |
|