Hacker News new | ask | show | jobs
by justindz 48 days ago
What a great lunch read! I've been weekend-warrioring a terminal-based CRPG for a bit myself. I was recently exploring ways to use agents to help with balance testing, which is a real scale problem for solo indie dev. So far, all I've created is a fight simulator: essentially, have the current player state (stats, effects, gear, companions, etc.) do this fight, simulated, X number of times using one of the currently-implemented GOAP personalities and report how often it wins, loses, average end turn, stuff like that.

I hadn't really thought about trying to create a harness for agents to play the full game interactively. I'd love to explore this. If you don't mind, here are a few questions:

1) Correct to assume that I probably need a text-only harness even though my game is text-based already because I do make use of menu selections made via arrow-key-and-enter interactions?

2) Do you have prompt recommendations for the type of feedback you have found to be useful? I would guess in your case, the objectives of the game are more clear than an open-world RPG. What dead ends have you run into? Maybe a variety of approaches would be good? One agent tries to fight everything. Another focuses on gaining and completing as many quests as possible?

3) How bad is the token burn doing this? Any optimization strategies you've employed?

2 comments

I did something similar, but instead of having the LLM play the game I had it build an entire bot system to play the game. Bots require much more determinism, but I'd rather burn tokens encoding problem solving approaches and bot decision profiles than using LLMs for every turn of the game. This can be developed rapidly if you create an agent in a loop and say "figure out how to have the bot reach room 3 in under 10 actions" or something like that. It is easy for this to get bloated, but I found it makes a nice feedback loop that allows me to quickly test things like pacing changes and think of the game as a series of user actions that can be sculpted purposefully.
Thanks, this is another great idea and I'll consider it as an addition or alternative. Do you think this works in an open-world, non-linear type game?
OP here: Thank you and I appreciate the thoughtful questions. To answer: 1) I used a text representation because it made sense for my game and let me "render" certain details in a more AI-friendly way, like the compact map. You could use something like agent-browser and it would probably work just fine, but I figured it added an extra layer of indirection that I didn't need, plus it would be a lot of screenshots! Being able to have a turn based loop really helped make this work.

2) I had a skill on just how to use the playtest server. I also gave it context on what the game is and how to play it. From there, it probably depends on your use case. I wasn't that impressed with its natural ability to playtest for bug discovery, so I would consider making a skill describing what a playtester would normally do. Focused playtester instances is a good idea. Ultimately what I found to be most helpful was to point it at a feature or bug that I was aware of and have it validate it. Not only was it fairly successful, that was the part that saved the most time for me.

3) I think I only burned about 300K tokens on my longest play-test session, and that includes a bunch of code tweaks too. Running it after every feature as a validation step is pretty cheap. Running it overnight in "open" playtesting could add up.

Good luck, please let me know how it goes if you get somewhere helpful!