|
|
|
|
|
by jn2clark
891 days ago
|
|
Can anyone comment on an open source multi-modal LLM that can produce structured outputs based on an image? I have not found a good open source one yet (this included), seems to be only closed source that can do this reliably well. Any suggestions are very welcome! |
|
https://imgur.com/a/hPAaZUv
https://huggingface.co/spaces/Qwen/Qwen-VL-Plus
You can also ask it to give you bounding boxes of objects.