|
|
|
|
|
by abeppu
1804 days ago
|
|
I think the premise of your question actually points to the real problem. In RL, b/c your current policy and actions determine what data you see next, you can't really just "scale existing algorithms" in the sense of shoving more of the same data through them on more powerful processors. There's a sequential process of acting/observing/learning which is bottlenecked on your ability to act in your environment (ie through your robot). Off-policy learning exists, but scaling up the amount of data you process from a bad initial policy doesn't really lead anywhere good. |
|