| HN Mirror

Y	Hacker News new \| ask \| show \| jobs


	by jn2clark 938 days ago
	Can anyone comment on an open source multi-modal LLM that can produce structured outputs based on an image? I have not found a good open source one yet (this included), seems to be only closed source that can do this reliably well. Any suggestions are very welcome!

2 comments

Something like this?

You can also ask it to give you bounding boxes of objects.

I've only used LLaVA / BakLLaVA. It falls under the LLAMA 2 Community License. Not sure if you consider that open source or not.