Hacker News new | ask | show | jobs
by cosmoharrigan 3658 days ago
The original papers [1][2] for the deep Q-network used reward clipping: "As the scale of scores varies greatly from game to game, we clipped all positive rewards at 1 and all negative rewards at -1, leaving 0 rewards unchanged".

[1] https://www.cs.toronto.edu/~vmnih/docs/dqn.pdf

[2] http://home.uchicago.edu/~arij/journalclub/papers/2015_Mnih_...