| HN Mirror

Y	Hacker News new \| ask \| show \| jobs


	by jeffharris 461 days ago
	We're thinking about diarization (adding time awareness to GPT models) but no firm plans to share just yet

2 comments

youssefabdelm 461 days ago

Jeff you know what would be magical? Not just vanilla diarization "Speaker 1" and "2" but if the model can know from the conversation this speaker was referred to as "Jeff Harris" or "Jeff" so it uses that instead.

link

youssefabdelm 461 days ago

Or if we could even provide samples of what an example speaker sounds like in general so that it would always classify them the way we want.

link

simonw 461 days ago

The feature I want is speaker differentiation - I want to feed in an audio file and get back a transcript with "Speaker 1: ..., Speaker 2: ..." indications.

That plus timestamps would be incredible.

The Google Gemini 2.0 models are showing some promise with this, I can't speak to their reliability just yet though.

link

runeb 461 days ago

I had good results with pyannote and the following model for that use case in the past https://huggingface.co/pyannote/speaker-diarization-3.1

link

infecto 461 days ago

I thought Deepgram already did speaker diarization (which is differentiation) pretty well. That and it can include timestamps plus other metadata.

link

thot_experiment 461 days ago

WhisperX does all of this, I use it all the time to transcribe meeting notes. Both speaker differentiation and individual word timestamps.

link