|
|
|
|
|
by nicklo
3290 days ago
|
|
Super exciting seeing on-par performance of RL tasks with dramatically less supervision. Really looking forward to a follow-up where they explore 2.2.4 further. Sampling examples which provide maximal information game seems like it could result in another huge reduction in the amount of human oversight necessary. Could see an adversarial scheme which could learn to sample these examples optimally from the manifold. This kind of thing is powerful in human learning of complex tasks to ask for clarification or feedback in specific places of uncertainty. |
|