Hacker News new | ask | show | jobs
by tsycho 1220 days ago
Is there an open source speech recognition model which can be restricted to a smaller domain-specific dictionary?

Use case: I want to transcribe my poker hands while playing, eg: "Flop was 2 of spaces, 3 of diamonds and King of spades", "Button raised to $20" etc.

When I tried using Whisper and some other model, the recognition accuracy was atrocious, and it kept finding non-poker words that sounded similar to poker words. I want to restrict its search space to my own list of poker words which should significantly increase the accuracy (theoretically).

Any suggestions on how to go about this?

2 comments

You can prefix a prompt for Whisper with a small text section containing desired vocab, and it will likely improve accuracy for that specific domain.

Whisper source is very readable, check out https://github.com/openai/whisper/blob/main/whisper/decoding...

Vosk

https://alphacephei.com/vosk/lm

You can restrict the vocabulary the way you like, for example, here is the chess app built with Vosk

https://www.chessvis.com/