Hacker News new | ask | show | jobs
by titanomachy 841 days ago
> 3. Run it through a basic LLM to correct common transcription errors specific to your company's vocabulary and remove filler words.

Do you just include a list of common errors in the prompt? Or have you trained a model for this?

2 comments

I would love to train a model, but no, it's just common errors that I've encountered. It can also fix it in context. For example if you're talking about the AI 'Groq' but the transcription always says 'grok', then you can, in this step, ask the LLM to fix that error if the context is appropriate (i.e. it's being used as a noun).
I do this with just a prompt setting context of the meeting if we have it, explaining this is a spoken meeting between n people, and asking to correct based on the entire context, acutally works decently out of the box.
Damn, I just tried this with ChatGPT (warning: contrived example) and it works surprisingly well:

PROMPT:

The following is a partial transcription of a meeting at a tech company. Please correct any transcription errors, using clues from context as well as the fact that this is a technical meeting.

Person A: we shouldn't use a sequel database to store this data, we probably want to use something Callum nor instead since it's more efficient than row-based storage. Something like a patchy park or arrow.

Person B: What's park, eh?

Person A: It's just a column-based storage format with efficient Jesus or snappy compression.

RESPONSE:

Person A: "we shouldn't use a SQL database to store this data, we probably want to use something columnar instead since it's more efficient than row-based storage. Something like Apache Parquet or Arrow."

Person B: "What's Parquet?"

Person A: "It's just a column-based storage format with efficient GZIP or Snappy compression."

EDIT: mistral 7B on the other hand botched this terribly, and made up a lot of nonsense unrelated to the transcript.

Yeah it’s pretty neat and can be improved with more context for technical terms. You do risk some hallucinations
Fantastic example.