Y
Hacker News
new
|
ask
|
show
|
jobs
by
sbierwagen
938 days ago
>spatial component is hard to describe in text
Idefics, mingpt4, Next-GPT and LLava are open source multimodal LLMs that can read images.
1 comments
crashmat
938 days ago
Yes, but do they get a vague idea of what's onscreen or could they really see what's going on in each tile, keep track of all the stats and use those to inform their decisions?
link