| HN Mirror

Y	Hacker News new \| ask \| show \| jobs


	by tskj 30 days ago
	This is also my gripe with a lot of this stuff, always evaluating models on what they can literally oneshot is completely pointless; it's not how anything works, neither for humans nor for scaffolded AIs. I guess it's neat if you want to argue that a certain level of intelligence can "never be achieved" in a single forward pass, but like, so what. No one cares about that, except people who have already decided to be anti AI. (not that I am in any sense pro AI, but it's just a weird lack of intellectual rigor)

1 comments

irthomasthomas 30 days ago

Asking a model to improve its output is not one-shotting tho? My observation was that asking an llm to iterate and improve a response causes it to add more stuff, rather tha repair the broken stuff. And that model progress in general has the same pattern. This new model adds more details to its responses but continues to make mistakes at about the same rate.

link

losvedir 30 days ago

The question was whether you were giving it the rendered image and using the model's visual modal capability, or feeding back in the textual SVG.

It's hard to "imagine" what the rendered SVG looks like, for both humans and LLMs, so just iterating on text won't really be as useful of a test. But if you show it what it rendered, it might observe the bad-looking bicycle and be able to fix the text that way.

link

irthomasthomas 29 days ago

"I've even experimented with feeding the broken pelican svgs to an image model to look for flaws, and they still fail to spot the broken elements."

link