Hacker News new | ask | show | jobs
by willwhitney 3401 days ago
Handling delays (and the uncertainty they entail) is a huge challenge, and I think it'll be a rich area of research. The simplest part of the problem is that delays in action or perception also slow the propagation of reward signals, and credit assignment is still a really hard problem.

Thinking further afield, future models could learn to adapt their expectations to fit the behavior of a particular opponent. This kind of metalearning is pretty much a wide open problem, though a pair of (roughly equivalent) papers in this direction recently came out from DeepMind: https://arxiv.org/abs/1611.05763 and OpenAI: https://arxiv.org/abs/1611.02779 It's going to be really exciting to see how these techniques scale.

1 comments

Really naive question, can't they just train the net to react instantaneously on a $d$-delayed screen? I don't see conceptually why this approach would succeed with d=0 but fail for (say) d=25ms. (I am too busy/lazy to read the papers and understand what breaks down.)
Basically we tried that and it sort of works, but performance degrades pretty fast with each frame of delay. The issue is likely that it makes credit assignment much harder. Instead of seeing an immediate change in the state (which your critic can interpret as good or bad), you have to wait a bunch of frames during which your previous actions are taking effect and interfering with the reward signal.