Hacker News new | ask | show | jobs
by leecho0 2904 days ago
The article claims that RL is simplistic because it uses an unreasonable amount of data. However, recent advances are significant because it uses unreasonable amount of data. As an example, I don't expect to be as good as Michael Jordan no matter how much I play basketball, or beat Garry Kasparov no matter how much I play chess. There's a fundamental flaw to my learning algorithm that prevents me from becoming good at something even if I have infinite experience.

Recent RL research about Policy Gradients / On Policy vs Off Policy / Function approximation / Model-based vs model-free are all research about how to get good at something with a lot of practice. RL has been around for a long time, discussions about higher level learning / planning has been done over and over. One doesn't discount the other. One deals with how to structure the learning problem that you can continue to get better with more experience (RL problem), while the other is about how to use higher level learning to speed it up.