|
|
|
|
|
by fredley
3131 days ago
|
|
Sounds like that's potentially solvable by breaking the podcast down into chunks by speaking voice, then flagging any sections of ~30s with a different speaking voice from the rest. Detecting guest speakers shouldn't get caught by this as there'd be more conversation rather than a mostly unbroken 30s chunk. |
|
Edit: https://github.com/ppwwyyxx/speaker-recognition/ looks like the first of a number of good starting points.