Hacker News new | ask | show | jobs
by 867-5309 1483 days ago
great project! since it relies heavily on subtitle files, and as an alternative to generating your own, which websites would you recommend to find subtitles for videos which are not on youtube i.e. movies and series? preferably ones with ratings systems similar to guitar tabs websites - I can envisage a musical similarity in the variance and quality of user-submitted content e.g. timing, volume, tone, punctuation, expression, improvisation, etc. since I doubt many are composed from the actual scripts. I have never used vosk so am also wondering whether it would be quicker and more reliable than filtering and spot checking say a few subtitle files per video
1 comments

I just started playing around with the transcription part after seeing this blog post. Consider giving it a try.

I'm not sure how well most subtitle sources will work with this. I don't think they'll generally embed the word timings needed for picking out fragments (just line timings). The blog post mentions it being the case for `.srt` specifically. Not 100% sure, someone with better understanding of the subtitle formats would be able to correct me.

FWIW I'm finding the video transcription to be working quite well (and I even decided to use Japanese-speaking media because I wanted to see how well vosk handles it).

It might be my system, but the transcription is unfortunately a bit slow/single threaded. I quickly added a GNU `parallel` in front of the transcription step to speed up processing an entire season.

thank you for the info

I hope the subtitles website I am searching for will provide multiple formats and I understand a lot more effort would be required to produce the .vtt with word fragments. running a diff on vosk text and subtitle file text might help to iron out ambiguities

I will at some point try vosk (with parallel!)