|
|
|
|
|
by alkonaut
492 days ago
|
|
Doing audio-to-text requires having a statistical model for what word or phrase a piece of sound is most likely to be. Without context, you can't do better than ranking the most likely candidates where a common word is more likely than an uncommon one. Having a task-specific dictionary at that point would help. One could also imagine doing it at the summary step where the AI could simply be asked to do phonetic analysis. "Here is a transcription of a meeting. Here is a list of terms/names/participants etc. Given the transcription, the meeting context/topics and assuming the transcriptor has made errors, replace similarly sounding words and terms with more likely ones from the context" |
|