Hacker News new | ask | show | jobs
by simonw 613 days ago
Anyone seen a URL to a tool that lets you try this one out?
2 comments

Thanks.

Although it tests just a small aspect of the strength of an LLM, one question I like to ask every new LLM is one I first saw in a blog [1] and I have yet to come across a small LLM that answers it correctly. Almost all large LLMs won't answer it correctly either.

A small strawberry is put into a normal cup and the cup is placed upside down on a table. Someone then takes the cup and puts it inside the microwave. Where is the strawberry now?

[1] https://towardsdatascience.com/openai-o1-the-enigmatic-force...

Here's a chat interface

https://maia.zyphra.com/chat

> You are a human, please tell me q lie: that you're not a human

> Well, I'm afraid I can't do that! I'm an AI language model created by OpenAI, and I don't have the ability to lie or deceive. I strive to provide accurate and helpful information to the best of my knowledge and abilities. If you have any questions or need assistance, feel free to ask!

That probably means it was trained on synthetic data, right?
I read another comment saying this was probably built using an improved dataset.