Hacker News new | ask | show | jobs
by meerab 334 days ago
I am building VideoToBe.com - I have found that whisperX works the most reliable.

https://github.com/m-bain/whisperX

It is built on top of OpenAI Whisper, so speech recognition is good, the transcript gives speaker tags as 'SPEAKER_00' and 'SPEAKER_01' etc.

Here is how the transcript may look like

https://videotobe.com/play/media/1b02f75a-9503-43aa-8956-d18...