Hacker News new | ask | show | jobs
by Darkphibre 1961 days ago
Interesting!

I've been thinking of running OCR on video frames. I'd also like to do speech-to-text extraction for searching my archives later (have about 4TB of video to trawl through, and desire text-based search capabilities). It's an interesting space to explore, but everything's been moving to web-service at a cost-prohibitive model.

4 comments

Should be able to use ffmpeg[0] to extract a single frame each second/keyframe (doubtful it's worth doing every single frame) and then pass it to tesseract.

For speech to text.. if english, try mozilla's deepspeech? https://github.com/mozilla/DeepSpeech

Might be fun to try.

[0] https://stackoverflow.com/questions/27568254/how-to-extract-...

Yup, was planning to use ffmpeg (or, more likely, OpenCV), and a subset of the frames.

Thanks so much for the tip on DeepSpeech!

@Darkphibre; we are happy to provide you an AI that takes in a video and outputs OCR and speech-to-text. With Base64.ai, you don't have to worry about the implementation details, and focus on your projects. Let's have a meeting to discuss more? https://base64.ai/meeting
For speech-to-text extraction you can try Silero [1].

Free software (AGPL-3.0 License), fast, highly accurate and extremely simple to deploy (I have no affiliation with them).

[1] https://github.com/snakers4/silero-models

Thanks for the heads up! Will definitely check it out.
If you’re looking to index/ process video - maybe we can help. Checkout Vidrovr (https://vidrovr.com)

Full disclosure im one of the founders.