Hacker News new | ask | show | jobs
by kqr 10 days ago
But LLMs are terrible at text adventures too. See e.g. https://entropicthoughts.com/updated-llm-benchmark and previous articles referenced in there.

I have yet to see any sort of harness that lets a frontier LLM interact with a text adventure and make meaningful progress on its own.

2 comments

To pile on, they're also bad at games that are 2D text based environments.

ARC-AGI-3 shows this: https://arcprize.org/arc-agi/3

I've done some work as well on Rogue (sorry for self-promotion): https://iwhalen.github.io/rogue-bench/

There is no "2D text" processing when it comes to LLMs. They process text as ordinary, sequential 1D text only. And humans process "2D text" like any other 2D image. So 2D text isn't really a thing in any case. Saying LLMs are bad at 2D text is like saying that humans are bad at 2D audio.
They are also pretty bad at navigating mazes (which can be somewhat similar in spirit to text adventures where you need to navigate through text): https://arxiv.org/abs/2507.20395