Hacker News new | ask | show | jobs
by apophis-ren 369 days ago
It's mentioned in the article. But for really, really long-horizon tasks, it might be reasonable that you don't want to have a small discount factor.

For example, if you have really sparse rewards in a long-horizon task (say, a reward appears 1000 timesteps after the action), then a discount factor of even 0.99 won't help to capture that: 0.99 ^ 1000 ≈ 4e^-5.

Essentially, if your discount factor is too small for an environment, it will be near impossible to learn certain credit assignments.