| HN Mirror

Y	Hacker News new \| ask \| show \| jobs

by ceheaaf 58 days ago

It feels like they're really focusing on overstating how confusing and weird it is that an LLM can write code but not play games very well, rather than just explaining it.

Code is text. LLMs are text input/output machines.

Game input/output is not at all text.

LLMs can certainly reason about games with a simple/explicit enough domain (try a risk tournament where models can talk to each other between turns!)

3 comments

cubefox 58 days ago

The other reason is lack of continual learning, especially for long games like RPGs.

link

kqr 58 days ago

But LLMs are terrible at text adventures too. See e.g. https://entropicthoughts.com/updated-llm-benchmark and previous articles referenced in there.

I have yet to see any sort of harness that lets a frontier LLM interact with a text adventure and make meaningful progress on its own.

link

iwhalen 58 days ago

To pile on, they're also bad at games that are 2D text based environments.

ARC-AGI-3 shows this: https://arcprize.org/arc-agi/3

I've done some work as well on Rogue (sorry for self-promotion): https://iwhalen.github.io/rogue-bench/

link

cubefox 57 days ago

There is no "2D text" processing when it comes to LLMs. They process text as ordinary, sequential 1D text only. And humans process "2D text" like any other 2D image. So 2D text isn't really a thing in any case. Saying LLMs are bad at 2D text is like saying that humans are bad at 2D audio.

link

haffi112 58 days ago

They are also pretty bad at navigating mazes (which can be somewhat similar in spirit to text adventures where you need to navigate through text): https://arxiv.org/abs/2507.20395

link

croes 58 days ago

LLMs are used for OpenClaw and similar to do tasks for their user.

Games are a bunch of tasks too.

So if they fail at game tasks maybe it’s a bad idea to advertise those LLMs as task doing assistants.

link