|
|
|
|
|
by sytelus
2436 days ago
|
|
Many caviates but impressive progress in manipulation, especially sim2real: - Only 20% attempts successful on hardest configs with 26+ moves - Solving steps are not generated by RL (but could be[1]) - Cube is modified internally to transmit additional state via bluetooth - Highly calibrated and fine tuned environment+MuJoCo based sim to match simulation to reality as much as possible - Open AI Five algorithm is pretty much reused as-is - Cumulative training time = 13 thousand years, same order of magnitude as the 40 thousand years - 32+64 V100 GPUs per training cycle [1] https://arxiv.org/abs/1805.07470 |
|