| HN Mirror

Y	Hacker News new \| ask \| show \| jobs


	by bubblyworld 473 days ago
	What an awesome project! I'm curious - I would have thought that rewarding unique coordinates would be enough to get the agent to (eventually) explore all areas, including the key ones. What did the agents end up doing before key areas got an extra reward? (and how on earth did you port Pokémon red to a RL environment? O.o)

2 comments

drubs 473 days ago

The environments wouldn't concentrate enough in the Rocket Hideout beneath Celadon Game Corner. The agent would have the player wander the world reward hacking. With wild battles enabled, the environments would end up in Lavender Tower fighting Gastly.

> (and how on earth did you port Pokémon red to a RL environment? O.o)

Read and find out :)

link

bubblyworld 473 days ago

Thanks haha, I kept reading =D I see, so it's not just that you have to visit the key areas, they need to show up in the episodes enough to provide a signal for training.

link

drubs 473 days ago

Yup!

link

wegfawefgawefg 473 days ago

you dont port it you wrap it. you can put anything in an rl environment. usually emulators are done with bizhawk, and some lua. worst case theres ffi or screen capture.

link

bubblyworld 473 days ago

Right, my thought was that this would be way too slow for episode rollout (versus an accelerated implementation in jax or something), but I guess not!

link

wegfawefgawefg 471 days ago

well thats the golden issue with rl, sample efficiency. it is env bounded, so you want an architecture that extracts the max possible information from each collected sample, avoiding catastrophic forgetting, prioritizing samples according to relevance

link

drubs 473 days ago

My first version of this project 5 years ago involved a python-lua named pipe using Bizhawk actually. No clue where that code went

link