Hacker News new | ask | show | jobs
by hyperbovine 3407 days ago
Really naive question, can't they just train the net to react instantaneously on a $d$-delayed screen? I don't see conceptually why this approach would succeed with d=0 but fail for (say) d=25ms. (I am too busy/lazy to read the papers and understand what breaks down.)
1 comments

Basically we tried that and it sort of works, but performance degrades pretty fast with each frame of delay. The issue is likely that it makes credit assignment much harder. Instead of seeing an immediate change in the state (which your critic can interpret as good or bad), you have to wait a bunch of frames during which your previous actions are taking effect and interfering with the reward signal.