| HN Mirror

Y	Hacker News new \| ask \| show \| jobs


	by TachyonicBytes 416 days ago
	Yes, this is exactly where I am going. The LLM also has an advantage, because you can give it the context of the audio (e.g. "this is an audio transcript from a radio show about etc. etc."). I can foresee this working for a future whisper-like model as well. There are two ways to parse your first sentence. Are you saying that you used whisperX and it doesn't do well with diarization? Because I am curious of alternative ways of doing that.