Hacker News new | ask | show | jobs
by ceheaaf 11 days ago
It feels like they're really focusing on overstating how confusing and weird it is that an LLM can write code but not play games very well, rather than just explaining it.

Code is text. LLMs are text input/output machines.

Game input/output is not at all text.

LLMs can certainly reason about games with a simple/explicit enough domain (try a risk tournament where models can talk to each other between turns!)

3 comments

The other reason is lack of continual learning, especially for long games like RPGs.
But LLMs are terrible at text adventures too. See e.g. https://entropicthoughts.com/updated-llm-benchmark and previous articles referenced in there.

I have yet to see any sort of harness that lets a frontier LLM interact with a text adventure and make meaningful progress on its own.

To pile on, they're also bad at games that are 2D text based environments.

ARC-AGI-3 shows this: https://arcprize.org/arc-agi/3

I've done some work as well on Rogue (sorry for self-promotion): https://iwhalen.github.io/rogue-bench/

There is no "2D text" processing when it comes to LLMs. They process text as ordinary, sequential 1D text only. And humans process "2D text" like any other 2D image. So 2D text isn't really a thing in any case. Saying LLMs are bad at 2D text is like saying that humans are bad at 2D audio.
They are also pretty bad at navigating mazes (which can be somewhat similar in spirit to text adventures where you need to navigate through text): https://arxiv.org/abs/2507.20395
LLMs are used for OpenClaw and similar to do tasks for their user.

Games are a bunch of tasks too.

So if they fail at game tasks maybe it’s a bad idea to advertise those LLMs as task doing assistants.