|
|
|
|
|
by hildekominskia
328 days ago
|
|
Skip pyannote 3.1; two battle-tested upgrades: 1. NVIDIA NeMo’s `diar_msdd_telephonic` (8 kHz) or `diar_msdd_mic` (16 kHz) — one-line Python install, GPU optional, beats pyannote on cross-talk.
2. AssemblyAI’s async `/v2/transcript` endpoint — gives you `words[].speaker` + Whisper-level accuracy for 40+ languages. Free tier: 3 h / month. Glue either to your existing Whisper pipeline and feed ChatGPT-4o with speaker-tagged text. The jump in clarity is night-and-day. I use the same combo to auto-caption interviews, then drop the synced footage into Veo 3 (https://veo-3.app) for instant talking-head explainers—works even for non-English audio. |
|