|
|
|
|
|
by Teleoflexuous
815 days ago
|
|
Whisper doesn't, but WhisperX <https://github.com/m-bain/whisperX/> does. I am using it right now and it's perfectly serviceable. For reference, I'm transcribing research-related podcasts, meaning speech doesn't overlap a lot, which would be a problem for WhisperX from what I understand. There's also a lot of accents, which are straining on Whisper (though it's also doing well), but surely help WhisperX. It did have issues with figuring out the number of speakers on it's own, but that wasn't a problem for my use case. |
|
Here’s an example for clarity:
1. AI is trained on the voice of a podcast host. As a side effect it now (presumably) has all the information it needs to replicate the voice
2. All the past podcasts can be processed with the AI comparing the detected voice against the known voice which leads to highly-accurate labelling of that person
3. Probably a nice side bonus: if two people with different registers are speaking over each other the AI could separate them out. “That’s clearly person A and the other one is clearly person C”