|
|
|
|
|
by frangonf
51 days ago
|
|
I took a look into local options for ASR and diarization some months ago, I missed that VibeVoice now has this feature. My conclusions back then (which only came from a shallow research on the topic and 0 real experience mind you) was that Whisper + Pyannote was the "stable" approach. Have the VibeVoice, Voxtral, Qwen or the Nemo solutions caught up in segmentation and speaker recognition? |
|