| HN Mirror

Y	Hacker News new \| ask \| show \| jobs


	by msohcw 3658 days ago
	Yes, it's fed the score as the reward value used. If I'm not wrong, they didn't normalise it across games for the initial paper but normalised it to some range for some of the following research experiments.

1 comments

cosmoharrigan 3657 days ago

The original papers [1][2] for the deep Q-network used reward clipping: "As the scale of scores varies greatly from game to game, we clipped all positive rewards at 1 and all negative rewards at -1, leaving 0 rewards unchanged".

[1] https://www.cs.toronto.edu/~vmnih/docs/dqn.pdf

[2] http://home.uchicago.edu/~arij/journalclub/papers/2015_Mnih_...