| HN Mirror

Y	Hacker News new \| ask \| show \| jobs

by genewitch 795 days ago

Whisper works ... kinda. I'm hoping there's another set of models released at some point, the error rate isn't appalling to me because i am transcribing TV shows and radio shows for personal use, so it's not mission critical.

There are a few whisper diarization "projects" but i've never been able to get it to work. Whisper does have word-level timestamps, so it should be simple to "plug in" diarization.

I don't need an LLM or whatever this project has, but i will see if it's runnable and if it's any better than what a couple podcasts i listen to use.

edit: see some people mentioning whisperx, which is one of those things that was cool until moving fast broke things:

>As of Oct 11, 2023, there is a known issue regarding slow performance with pyannote/Speaker-Diarization-3.0 in whisperX. It is due to dependency conflicts between faster-whisper and pyannote-audio 3.0.0. Please see this issue for more details and potential workarounds.

which means that what i gain is a ~3x increase in large-v2 speeds but i instantly lose those gains with diarization, unless i track down 8 month old bug workarounds.

I'll stick with the py venv whisper install i've been using for the last 16 months, tyvm

1 comments

forgingahead 795 days ago

Re: Diarization, I had decent results with testing this on Colab a while ago:

https://github.com/MahmoudAshraf97/whisper-diarization

I remember having the usual python package hell when NeMo was updated somewhere, but it seems to be decently well maintained so give it a go.

*Edit, I remember reading somewhere that pyannote was a weak link in other repos, that might be why your other tests were not great.

link