Hacker News new | ask | show | jobs
by thinkersilver 2478 days ago
If AWS gets you going definitely go for it but if you're looking for existing tools type 'python speaker diarization' in github.'

Most of what you'll find would require downsampling your audio to 16khz and you'll find a combination of NN based diarizer and hmm based models pre-ML.

One thing to note a lot of the systems will work well for interviews, broadcast media and footage taken from a camera because the audio will tend to be clean.

Film and movies will be a challenge because of the background music being identified as a separate voice. It tends to to confuse it.

Haven't used the cloud based systems with audio the background sounds you tend to find in film movies.