| HN Mirror

Y	Hacker News new \| ask \| show \| jobs


	by sbierwagen 938 days ago
	>spatial component is hard to describe in text Idefics, mingpt4, Next-GPT and LLava are open source multimodal LLMs that can read images.

1 comments

crashmat 938 days ago

Yes, but do they get a vague idea of what's onscreen or could they really see what's going on in each tile, keep track of all the stats and use those to inform their decisions?

link