Btw, it doesn't really sound like the problem needs a video as an input to llm. Feels like sending an image is okay. So that makes it less demanding(?)