| HN Mirror

Y	Hacker News new \| ask \| show \| jobs


	by levocardia 472 days ago
	Really cool work. It seems like some critical areas (team rocket, safari zone) rely on encoding game knowledge into the reward function somehow, which "smuggles in" external intelligence about the game. A lot of these are related to planning, which makes me wonder whether you could "bolt on" an LLM to do things like steer the RL agent, dynamically choose what to reward, or even do some of the planning itself. Do you think there's any low-hanging fruit on this front?

2 comments

Xelynega 472 days ago

For well-known games like "Pokemon Red" I wonder how much of that game knowledge would be "smuggled in" by an LLM in it's training data if you just replaced the external info in the reward function with it/used it to make up for other deficiencies.

I think they allude to this in their conclusion, but it's less about the low-hanging fruit and more about designing a system to feedback game dialogue into the RL decision making process in a way that can be mutated as part of the RL(be it an LLM or something else)

drubs 472 days ago

Wrote about this in the results section. I think there is a way to mix the two and simplify the rewards in the process. A lot of the magic behind getting the agent to teach and use cut probably could have been handled by an LLM.