|
|
|
|
|
by tootyskooty
333 days ago
|
|
Honestly might be more indicative of how far behind vision is than anything. Despite the fact that CV was the first real deep learning breakthrough VLMs have been really disappointing. I'm guessing it's in part due to basic interleaved web text+image next token prediction being a weak signal to develop good image reasoning. |
|
https://annas-archive.org/blog/critical-window.html
I hope one of these days one of these incredibly rich LLM companies accidentally solves this or something, would be infinitely more beneficial to mankind than the awful LLM products they are trying to make