|
|
|
|
|
by kqr
10 days ago
|
|
But LLMs are terrible at text adventures too. See e.g. https://entropicthoughts.com/updated-llm-benchmark and previous articles referenced in there. I have yet to see any sort of harness that lets a frontier LLM interact with a text adventure and make meaningful progress on its own. |
|
ARC-AGI-3 shows this: https://arcprize.org/arc-agi/3
I've done some work as well on Rogue (sorry for self-promotion): https://iwhalen.github.io/rogue-bench/