Hacker News new | ask | show | jobs
by throwawayoldie 346 days ago
What do you suppose would happen if you tried it on a game that doesn't have 25 years of walkthroughs written for it?
1 comments

That’s a good point. For 9:05, I expect it would work just as well, since the game helps the user in many ways. The puzzles are of the type “The door is closed”, and you solve them with “open door.”

My suggestion concerns the poor performance DougHaber mentioned: if 9:05 can’t be solved, something else must be wrong with his experiments.

I’ve tried three dozen games, and it’s still hard to find ones suitable for LLM benchmarks. With non-linear complex text-adventure games, my guess is, that they get stuck in an endless loop at some point. Hence, I just test the progress in the first hundred steps.