So it's still just analysing the transcript, not using GPT4-V or OCR in any way?
Can you confirm if I could skip using VideoDB by using Whisper to transcribe the video, and then use that transcript with LLaMa to extract the important parts?
It analyse the transcript, but there is no way to get back the video clip without building your own video infra. We at Videodb are solving the exact problem.
Can you confirm if I could skip using VideoDB by using Whisper to transcribe the video, and then use that transcript with LLaMa to extract the important parts?