Hacker News new | ask | show | jobs
by password4321 1154 days ago
Both speaker and speech recognition are done in the article using huggingface.

Is there anything as good ready to use on-prem for the diarization (speaker recognition)?

I've heard good things about whisper(.cpp) for speech recognition and vosk used to be king of that hill...

2 comments

Diarization can be done on premise using pyannote (what they use in the article). Huggingface offers a library to run things locally and an API to run things on their cloud. Pyannote is available under an MIT licence
vosk is really good, but also a good example of an open source project with great potential, but doesn't scale up because the person behind it is a douchebag.

documentation is poor, and what you find is sparsed outdated shit on the web, so it's really hard to find help.