Hacker News new | ask | show | jobs
by levocardia 472 days ago
Really cool work. It seems like some critical areas (team rocket, safari zone) rely on encoding game knowledge into the reward function somehow, which "smuggles in" external intelligence about the game. A lot of these are related to planning, which makes me wonder whether you could "bolt on" an LLM to do things like steer the RL agent, dynamically choose what to reward, or even do some of the planning itself. Do you think there's any low-hanging fruit on this front?
2 comments

For well-known games like "Pokemon Red" I wonder how much of that game knowledge would be "smuggled in" by an LLM in it's training data if you just replaced the external info in the reward function with it/used it to make up for other deficiencies.

I think they allude to this in their conclusion, but it's less about the low-hanging fruit and more about designing a system to feedback game dialogue into the RL decision making process in a way that can be mutated as part of the RL(be it an LLM or something else)

Wrote about this in the results section. I think there is a way to mix the two and simplify the rewards in the process. A lot of the magic behind getting the agent to teach and use cut probably could have been handled by an LLM.