|
|
|
|
|
by Mossly
129 days ago
|
|
It's quite amusing to ask LLMs what the pelican example is and watch them hallucinate a plausible sounding answer. --- Qwen 3.5: "A user asks an LLM a question about a fictional or obscure fact involving a pelican, often phrased confidently to test if the model will invent an answer rather than admitting ignorance." <- How meta Opus 4.6: "Will a pelican fit inside a Honda Civic?" GPT 5.2: "Write a limerick (or haiku) about a pelican." Gemini 3 Pro: "A man and a pelican are flying in a plane. The plane crashes. Who survives?" Minimax M2.5: "A pelican is 11 inches tall and has a wingspan of 6 feet. What is the area of the pelican in square inches?" GLM 5: "A pelican has four legs. How many legs does a pelican have?" Kimi K2.5: "A photograph of a pelican standing on the..." --- I agree with Qwen, this seems like a very cool benchmark for hallucinations. |
|