| HN Mirror

Y	Hacker News new \| ask \| show \| jobs

by password4321 1201 days ago

Both speaker and speech recognition are done in the article using huggingface.

Is there anything as good ready to use on-prem for the diarization (speaker recognition)?

I've heard good things about whisper(.cpp) for speech recognition and vosk used to be king of that hill...

2 comments

rolisz 1201 days ago

Diarization can be done on premise using pyannote (what they use in the article). Huggingface offers a library to run things locally and an API to run things on their cloud. Pyannote is available under an MIT licence

link

boredemployee 1201 days ago

vosk is really good, but also a good example of an open source project with great potential, but doesn't scale up because the person behind it is a douchebag.

documentation is poor, and what you find is sparsed outdated shit on the web, so it's really hard to find help.

link