Hacker News new | ask | show | jobs
by agwp 277 days ago
Have you explored using speaker diarization and speaker identification, given that pyannote etc. takes this approach?

I'm curious given your decision to capture speaker names from the screen. I see the merits during desktop recording, but I can also see how this limits utility when trying to offer the same functionality across desktop and other scenarios (e.g. in-person meetings, audio uploads etc.)

1 comments

We already support diarization in the Desktop Recording SDK by capturing the meeting platform’s speaker-change events, so you get a diarized transcript plus precise “speaker started talking” timestamps out of the box. We also support voice-signature diarization via third-party STT providers for participants calling in from the same room

For in-person meetings and audio uploads, this is on our roadmap and in development. More to come on this!