|
|
|
|
|
by thatwasunusual
196 days ago
|
|
> We are getting to the point that its not unreasonable to think that "Generate an SVG of a pelican riding a bicycle" could be included in some training data. I may be stupid, but _why_ is this prompt used as a benchmark? I mean, pelicans _can't_ ride a bicycle, so why is it important for "AI" to show that they can (at least visually)? The "wine glass problem"[0] - and probably others - seems to me to be a lot more relevant...? [0] https://medium.com/@joe.richardson.iii/the-curious-case-of-t... |
|
Honestly though, the benchmark was originally meant to be a stupid joke.
I only started taking it slightly more seriously about six months ago, when I noticed that the quality of the pelican drawings really did correspond quite closely to how generally good the underlying models were.
If a model draws a really good picture of a pelican riding a bicycle there's a solid chance it will be great at all sorts of other things. I wish I could explain why that was!
If you start here and scroll through and look at the progression of pelican on bicycle images it's honestly spooky how well they match the vibes of the models they represent: https://simonwillison.net/2025/Jun/6/six-months-in-llms/#ai-...
So ever since then I've continue to get models to draw pelicans. I certainly wouldn't suggest anyone take serious decisions on model usage based on my stupid benchmark, but it's a fun first-day initial impression thing and it appears to be a useful signal for which models are worth diving into in more detail.