|
|
|
|
|
by falloutx
153 days ago
|
|
Novel is different from a codebase. In code you can have a relationship between files and most files can be ignored depending on what you're doing. But for a novel, its a sequential thing, in most cases A leads to B and B leads to C and so on. > Re: movies. Get YouTube premium and ask YouTube to summarize a 2hr video for you. This is different from watching a movie. Can it tell what suit actor was wearing? Can it tell what the actor's face looked like? Summarising and watching are too different things. |
|
https://github.com/JUNJIE99/MLVU
https://huggingface.co/datasets/OpenGVLab/MVBench
Ovis and Qwen3-VL are examples of models that can work with multiple frames from a video at once to produce both visual and temporal understanding
https://huggingface.co/AIDC-AI/Ovis2.5-9B
https://github.com/QwenLM/Qwen3-VL