|
|
|
|
|
by willwhitney
3401 days ago
|
|
Handling delays (and the uncertainty they entail) is a huge challenge, and I think it'll be a rich area of research. The simplest part of the problem is that delays in action or perception also slow the propagation of reward signals, and credit assignment is still a really hard problem. Thinking further afield, future models could learn to adapt their expectations to fit the behavior of a particular opponent. This kind of metalearning is pretty much a wide open problem, though a pair of (roughly equivalent) papers in this direction recently came out from DeepMind: https://arxiv.org/abs/1611.05763 and OpenAI: https://arxiv.org/abs/1611.02779 It's going to be really exciting to see how these techniques scale. |
|