Hacker News new | ask | show | jobs
by rtkwe 1816 days ago
There are several incentive fixes: change the negative incentive to a factor that discounts the reward for catching a sheep, add a negative incentive to death, or a positive incentive to being alive at the end of the simulation. The failure here was they didn't think about what happens when the agent can't achieve a positive score, ie can't catch a sheep.