Hacker News new | ask | show | jobs
by nextzck 107 days ago
I get excited for every new vision model, especially those that work better and more efficiently. Vision is where we are so very far behind.. I can’t wrap my head around it
1 comments

What do you mean far behind? Far behind what? The new (actually the old one too) Qwen can give you bounding rectangular prisms around things in a scene, OCR text with ink spilled on it correctly, read graphs and understand spatial relationships, I think it's pretty impressive for something I'm running on like a 5 year old GPU.
yeah i know lol, that’s kind of my point. impressive that it runs on your gpu, but it still can’t tell you what happens if you tilt a glass. that’s what world models are working toward. but even then..so what? you get a perfect simulator. it knows the glass tips. it still doesn’t know why someone tipped it, or what happens if they don’t. A four year old can do this and we’re just barely on step one and a half.