| HN Mirror

Y	Hacker News new \| ask \| show \| jobs


	by scotty79 1816 days ago
	Just remember that you are optimizing for what you actually encoded in your rewards, your system, and your evaluation procedure, not for what narrative you constructed about what you think you are doing. I had my own expeirience with this when I tried to train "rat" to get out of the maze. I rewarded rats for exiting but for some simple labirynths I generated for testing it was possible to exit it by just going straight ahead. So this strategy quickly dominated my testing population.