| HN Mirror

Y	Hacker News new \| ask \| show \| jobs


	by alkonaut 497 days ago
	Something I find annoying with automatic transcriptions and summaries, like the one built into Teams, is that they lack the context necessary to properly interpret what's being said. Example if I have a meeting discussing products, abbreviations or systems with "internal" names then it can't discern them or statistically rejects them, replacing them with its best guess for a dictionary word instead. So say we have a long call involving frequent mentions about a measure called pNet pronounced in the meeting "Peenet". Then you end up with a transcription of a bunch of guys having a discussion about penises. Hilarious, the first few times. OK always hilarious, but not so useful. Being able to set the system prompt for these transcriptions would be very useful. Like "You are a friendly bot transcribing meetings at a software company. Some common terms and abbreviations you'll encounter are...".

3 comments

_joel 497 days ago

My favourite was Kubernetes in our meeting being referred to as Cuban Eighties. ⎈

link

thih9 497 days ago

Anecdotally, if you have an accent and want to reference Maltese Falcon[1], your voice recognition software may understand it as “Maltese f* off”.

[1]: https://en.m.wikipedia.org/wiki/The_Maltese_Falcon_(1941_fil...

link

sys_64738 497 days ago

Perhaps these will be flagged for the CIA or DEA to investigate due to illegal importation of Cubans from the enemy!

link

jvanderbot 497 days ago

This should be trivially solveable with a glossary as context, as you suggest. I bet the above repo would love a PR, too!

link

sesm 497 days ago

But the error happens in 'audio to text' part, so text prompt won't solve it. The way to fix it is probably fine-tuning the underlying audio to text model.

link

alkonaut 497 days ago

Doing audio-to-text requires having a statistical model for what word or phrase a piece of sound is most likely to be. Without context, you can't do better than ranking the most likely candidates where a common word is more likely than an uncommon one. Having a task-specific dictionary at that point would help.

One could also imagine doing it at the summary step where the AI could simply be asked to do phonetic analysis. "Here is a transcription of a meeting. Here is a list of terms/names/participants etc. Given the transcription, the meeting context/topics and assuming the transcriptor has made errors, replace similarly sounding words and terms with more likely ones from the context"

link

ukuina 497 days ago

Whisper accepts a system prompt.

link

collinmcnulty 497 days ago

Gong has such a feature. It’ll even expand out acronyms the first time they show up in the transcript.

link