| HN Mirror

Y	Hacker News new \| ask \| show \| jobs


	by mijoharas 492 days ago
	I was looking into something like this for linux recently. Didn't find anything obviously simple (considered hooking up whisper.cpp and a bit of audio magic to make it at least transcribe, but it firstly seemed like a fair bit of a pain and secondly I couldn't think of a nice way to do speaker detection.)

1 comments

utrack 492 days ago

https://github.com/m-bain/whisperX looks promising - I'm hacking away on an always-on transcriber for my notes for later search&recall. It has support for diarization (the speaker detection you're looking for).

I'm currently hacking away on a mix of https://github.com/speaches-ai/speaches + https://github.com/ufal/whisper_streaming though - mostly because my laptop doesn't have a decent GPU, I stream the audio to a home server instead.

But overall it's pretty simple to do after you wrangle the Python dependencies - all you need is a sink for the text files (for example, create a new file for every Teams meeting, but that's another story...)

link

mijoharas 492 days ago

Any good solutions for capturing the audio streams and piping them where they're needed? (I.e both microphone and speakers. I was wondering if I needed to mess with pulseaudio and/or jack (I mean pipewire under the hood, but I think those APIs sit on top and might be clearer))

link

mijoharas 492 days ago

Never mind, played around a little, and pulseaudio's cli API makes it easy enough to sling some loopback/virtual devices around that you can then read from easily enough.

link

ewuhic 492 days ago

So which are you "hacking away on" in the end?

link