|
|
|
|
|
by breatheoften
2683 days ago
|
|
(disclaimer: I am not a RL researcher) I think grandparent was using 'model' to refer to model-based or 'value-based' reinforcement learning algorithms (as distinct from 'model-free' methods (ex: 'policy-based' methods)). I don't think they were directly referring to the same 'model' as is meant by MPC. In RL, the goal is to try to find a function that produces actions that optimize the expected reward of some reward function. Model-based RL methods typically try to extract a function for 'representing' the environment and employ techniques to optimize action selection over that 'representation' (replace the word 'representation' with the word 'model'). Model-free RL methods instead try to directly learn to predict which actions to take without extracting a representation. A good paper describing deep q-learning -- a commonly cited model-free method that was one of the earliest to employ deep-learning for a reinforcement learning task [1]. I think it's worth clarifying -- RL algorithms as a whole are more akin to search than to control algorithms. RL algorithms can be used to solve some control problems -- but that is not all they are used for unless you take an extremely broad view about what constitutes a 'control problem' ... I don't think it would be common to model playing 'go' as a control problem for example -- nor would I consider learning how to play all atari games ever created given only image frames and the current score and no other pre-supplied knowledge to be a control problem ...? (I'm talking way passed my familiarity now) -- That said, optimal control theory intersects with RL quite a bit in the foundations -- Q-Learning techniques (a foundational family of methods in RL) have proofs that show under what conditions they will converge on the optimal policy -- I believe this mathematics to be quite similar to the mathematics used in optimal control theory... [1] Deep Q-Networks https://storage.googleapis.com/deepmind-media/dqn/DQNNatureP... |
|
The goal of optimal control is broadly similar to RL in that it aims to optimize some expected reward function by optimizing action selection for implementation in the environment.
The difference is the optimal control does not seek to learn either a representation or a policy in real-time -- it assumes both are known a priori.
Both can be thought of as containing hidden Markov models, though in optimal control the transition functions are assumed to be known whereas in RL they are unknown.
Another difference is that in control theory, we assume there is always a model -- though some models are implicit. You see, control algorithms either assume that the environment is explicitly characterized (model-based, like MPC), or that the controller contains an implicit model of the environment (internal model control principle, i.e. we adjust tuning parameters in PID control... there's no explicit model, but a correctly tuned controller behaves like a model-inverse/mirror of reality). In either of these cases, either the implicit or explicit model are arrived at before hand -- once deployed, no learning or continual updating of the controller structure is done.
In contrast, RL has an exploration (i.e. learning) component that is missing from most control algorithms [1], and actively trades-off exploration vs exploitation. In that sense, RL encompasses a larger class of problems than just control theory, whereas control theory is specialized towards the exploitation part of the exploration vs exploitation spectrum.
[1] Though there are some learning controllers like ILCs (iterative learning control) and adaptive controllers which continually adapt to the environment. They have a weakness (perhaps RL suffers from the same) in that if a transient anomalous event comes through, they learn it and it messes up their subsequent behavior...