Hacker News new | ask | show | jobs
by lacoolj 82 days ago
So it's the age of AI. And this seems like a great new benchmark! Lots of text, structured but each item a separate "task". Each thing requiring its own new image + textual representation.

I copy + pasted the whole article (minus the few included images) and added this prompt in Gemini 3 Pro:

> Take each of the following and add an image representing the act being described. The image should be very basic. Think of signs in buildings - exit signs, bathroom door signs, no smoking signs, etc. That style of simplicity. Just simple, flat, elegant vector graphic lines for the chopsticks, hands, bowls, etc.

Google Gemini output: https://gemini.google.com/share/11df1bc53e3d

I think this is pretty dang good for a one-shot run. I also ran this through Claude Opus 4.6 Extended (doesn't generate images directly, so it made an HTML page and some vector icons). Not as good as Gemini IMO. See here if curious: https://claude.ai/public/artifacts/8b6589b3-4da4-4fd5-b862-c...

Anyone able to do this better with a different prompt or model (or both)?

1 comments

Nice that you discovered LLM, welcome.

But next time, keep your findings for a thread related to the topic of LLM wonders, not when it's unrelated, such as chopsticks.