|
|
|
|
|
by flowerthoughts
380 days ago
|
|
Assuming with GDM, you mean Google-Deep Mind. They pioneered RL with deep nets as policy function estimator. The deep nets being a result of CNNs and massive improvements in hardware parallelization at the time. RL was established, at the latest, with Q-learning in 1989: https://en.wikipedia.org/wiki/Q-learning |
|
i still think my original statement is fair