Hacker News new | ask | show | jobs
by pineapple_opus 35 days ago
All I see is mention of how various models generate image of "pelican riding bicycle(s)"
3 comments

Yes, the "pelican riding a bicycle" is the ultimate test of not understanding how LLMs work.

Well, a combination of that and believing that replication of test data is a good measure of progress.

Spicy — why does it show ultimate non-understanding?
because success comes from reproducing a memorized pattern rather than transferable reasoning?

At the same time failure proves little because most humans also could not manually create a correct SVG of a pelican riding a bicycle.

What is it exactly that such a test is testing?

In which situation would you measure the "competence" of a human being by asking them to write an SVG of a pelican riding a bicycle?

> most humans also could not manually create a correct SVG of a pelican riding a bicycle.

Most humans absolutely can write this with a suitable vector graphics tool such as inkscape or illustrator.

Surely, you're not suggesting that a fair comparison would be using a text editor?

If so, would you suggest an equivalent raster based task would only be fair, if the human would manually assigning RGB values to each pixel?

We all know the true test of AI is Will Smith eating spaghetti.
Wait, are you saying you don't handcraft svgs of pelicans riding bicycles?