|
|
|
|
|
by keepamovin
44 days ago
|
|
Yeah, actually I think that’s really smart. Because after you convert everything to JPEG everything is just an image that you can ask LLMs to look at. Unfortunately, I don’t have the experience with local models, but if someone wants to point me in like the right direction or send me an email to collab. |
|
In my experience, it takes about 15 to 30 seconds per image, but the quality of the results is quite good if a bit verbose [2].
[1] - https://huggingface.co/Qwen/Qwen3-VL-8B-Instruct-FP8
[2] - https://mordenstar.com/other/vlm-xkcd