| HN Mirror

Y	Hacker News new \| ask \| show \| jobs


	by kqr 58 days ago
	But LLMs are terrible at text adventures too. See e.g. https://entropicthoughts.com/updated-llm-benchmark and previous articles referenced in there. I have yet to see any sort of harness that lets a frontier LLM interact with a text adventure and make meaningful progress on its own.

2 comments

iwhalen 58 days ago

To pile on, they're also bad at games that are 2D text based environments.

ARC-AGI-3 shows this: https://arcprize.org/arc-agi/3

I've done some work as well on Rogue (sorry for self-promotion): https://iwhalen.github.io/rogue-bench/

link

cubefox 57 days ago

There is no "2D text" processing when it comes to LLMs. They process text as ordinary, sequential 1D text only. And humans process "2D text" like any other 2D image. So 2D text isn't really a thing in any case. Saying LLMs are bad at 2D text is like saying that humans are bad at 2D audio.

link

haffi112 58 days ago

They are also pretty bad at navigating mazes (which can be somewhat similar in spirit to text adventures where you need to navigate through text): https://arxiv.org/abs/2507.20395

link