| HN Mirror

Y	Hacker News new \| ask \| show \| jobs

by ffsm8 58 days ago

Clearly not.

I mean the prompt was succinct and clear, as always - and it still decided to hallucinate multiple features (animation + controls) beyond the prompt.

It'd also like to point out that to date no drawing was actually good from an actual quality perspective (as in comparative to what a decent designer would throw together)

Theyre always only "good" from the perspective of it being a one shot low effort prompt. Very little content for training purposes.

2 comments

nwienert 57 days ago

The way I’ve come to think of LLM is that what the produce in a single reply even with thinking turned up, is akin to what you’d do in a single short session of work.

And so if you ask it to do something big it will do a very surface level implementation. But if you have it iterate many times, or give it small pieces each time, you’ll end up with something closer to what a human would do.

I imagine the pelican test but done in a harness that has the agents iterate 10+ times would be closer to what you’d expect, especially if a visual model was critiquing each time.

link

slopinthebag 57 days ago

Yeah, this is how I use AI. Instead of a single session one-shot, it's usually limited to single targeted edits, and then I steer it on each step. Takes longer but the output is actually what I want.

link

serial_dev 57 days ago

What does good even mean… I have no idea what a good “pelican on a bike” should look like. It’s a fun prompt because there is no good answers… at least so I thought.

link

abustamam 57 days ago

Yeah that was exactly Simon's intent. https://simonwillison.net/2025/Nov/13/training-for-pelicans-...

link

ffsm8 57 days ago

There are countless examples of animals riding bicycles etc from Comic books I grew up with

It would always look goofy - by design, but it usually looked good.

link