|
|
|
|
|
by 867-5309
1483 days ago
|
|
great project! since it relies heavily on subtitle files, and as an alternative to generating your own, which websites would you recommend to find subtitles for videos which are not on youtube i.e. movies and series? preferably ones with ratings systems similar to guitar tabs websites - I can envisage a musical similarity in the variance and quality of user-submitted content e.g. timing, volume, tone, punctuation, expression, improvisation, etc. since I doubt many are composed from the actual scripts. I have never used vosk so am also wondering whether it would be quicker and more reliable than filtering and spot checking say a few subtitle files per video |
|
I'm not sure how well most subtitle sources will work with this. I don't think they'll generally embed the word timings needed for picking out fragments (just line timings). The blog post mentions it being the case for `.srt` specifically. Not 100% sure, someone with better understanding of the subtitle formats would be able to correct me.
FWIW I'm finding the video transcription to be working quite well (and I even decided to use Japanese-speaking media because I wanted to see how well vosk handles it).
It might be my system, but the transcription is unfortunately a bit slow/single threaded. I quickly added a GNU `parallel` in front of the transcription step to speed up processing an entire season.