| HN Mirror

Y	Hacker News new \| ask \| show \| jobs

by utrack 493 days ago

https://github.com/m-bain/whisperX looks promising - I'm hacking away on an always-on transcriber for my notes for later search&recall. It has support for diarization (the speaker detection you're looking for).

I'm currently hacking away on a mix of https://github.com/speaches-ai/speaches + https://github.com/ufal/whisper_streaming though - mostly because my laptop doesn't have a decent GPU, I stream the audio to a home server instead.

But overall it's pretty simple to do after you wrangle the Python dependencies - all you need is a sink for the text files (for example, create a new file for every Teams meeting, but that's another story...)

2 comments

mijoharas 493 days ago

Any good solutions for capturing the audio streams and piping them where they're needed? (I.e both microphone and speakers. I was wondering if I needed to mess with pulseaudio and/or jack (I mean pipewire under the hood, but I think those APIs sit on top and might be clearer))

link

mijoharas 493 days ago

Never mind, played around a little, and pulseaudio's cli API makes it easy enough to sling some loopback/virtual devices around that you can then read from easily enough.

link

ewuhic 493 days ago

So which are you "hacking away on" in the end?

link