Hacker News new | ask | show | jobs
by msohcw 3658 days ago
Yes, it's fed the score as the reward value used. If I'm not wrong, they didn't normalise it across games for the initial paper but normalised it to some range for some of the following research experiments.
1 comments

The original papers [1][2] for the deep Q-network used reward clipping: "As the scale of scores varies greatly from game to game, we clipped all positive rewards at 1 and all negative rewards at -1, leaving 0 rewards unchanged".

[1] https://www.cs.toronto.edu/~vmnih/docs/dqn.pdf

[2] http://home.uchicago.edu/~arij/journalclub/papers/2015_Mnih_...