|
Thanks for the insight on RL. That's good context for me. I would say though that from my experience, computational cost is rarely the issue with model-based control, because there are various attacks ranging from model simplification (surrogate models, piecewise-affine multi-models i.e. switching between many simpler local models, etc) to precomputing the optimal control law [1] to embedding the model in silicon. Also, some optimal models/control laws can actually parallelized fairly easily (MLD models are expressed mixed-integer programs which can be solved in performant ways using parallel algorithms, with some provisos). This is a well-trodden space with a tremendous amount of industry-driven research behind it. Most of these methods come under the Model Predictive Control (MPC) umbrella which has been studied extensively over 3 decades [2]. The paradigm is extremely simple: (1) given a model of how output y responds to input u, predict over the next n time periods the values of u's needed to optimize an objective function. (2) Implement ONLY the first u. (3) Read the sensor value for y (actual y in real world). (4) Update your model with the difference between actual y and predicted y, move the prediction window forward, and repeat (feedback). When this is applied recursively, you obtain approximately optimal control on real-life systems even in the presence of model-reality mismatch, noise and bounded uncertainty. If you think about it, this is the paradigm behind many planning strategies -- forecast, take a small action, get feedback, try again. The difference though is that MPC is a strategy with a substantial amount of mathematical theory (including stability analysis, reachability, controllability, etc.), software, and industrial practice behind it. [1] Explicit MPC http://divf.eng.cam.ac.uk/cfes/pub/Main/Presentations/Morari... [2] https://en.wikipedia.org/wiki/Model_predictive_control |
In RL, the goal is to try to find a function that produces actions that optimize the expected reward of some reward function. Model-based RL methods typically try to extract a function for 'representing' the environment and employ techniques to optimize action selection over that 'representation' (replace the word 'representation' with the word 'model'). Model-free RL methods instead try to directly learn to predict which actions to take without extracting a representation. A good paper describing deep q-learning -- a commonly cited model-free method that was one of the earliest to employ deep-learning for a reinforcement learning task [1].
I think it's worth clarifying -- RL algorithms as a whole are more akin to search than to control algorithms. RL algorithms can be used to solve some control problems -- but that is not all they are used for unless you take an extremely broad view about what constitutes a 'control problem' ... I don't think it would be common to model playing 'go' as a control problem for example -- nor would I consider learning how to play all atari games ever created given only image frames and the current score and no other pre-supplied knowledge to be a control problem ...?
(I'm talking way passed my familiarity now) -- That said, optimal control theory intersects with RL quite a bit in the foundations -- Q-Learning techniques (a foundational family of methods in RL) have proofs that show under what conditions they will converge on the optimal policy -- I believe this mathematics to be quite similar to the mathematics used in optimal control theory...
[1] Deep Q-Networks https://storage.googleapis.com/deepmind-media/dqn/DQNNatureP...