Hacker News new | ask | show | jobs
by wenc 2682 days ago
Thanks for sharing some really interesting thoughts. Just to add on to your comment...

The goal of optimal control is broadly similar to RL in that it aims to optimize some expected reward function by optimizing action selection for implementation in the environment.

The difference is the optimal control does not seek to learn either a representation or a policy in real-time -- it assumes both are known a priori.

Both can be thought of as containing hidden Markov models, though in optimal control the transition functions are assumed to be known whereas in RL they are unknown.

Another difference is that in control theory, we assume there is always a model -- though some models are implicit. You see, control algorithms either assume that the environment is explicitly characterized (model-based, like MPC), or that the controller contains an implicit model of the environment (internal model control principle, i.e. we adjust tuning parameters in PID control... there's no explicit model, but a correctly tuned controller behaves like a model-inverse/mirror of reality). In either of these cases, either the implicit or explicit model are arrived at before hand -- once deployed, no learning or continual updating of the controller structure is done.

In contrast, RL has an exploration (i.e. learning) component that is missing from most control algorithms [1], and actively trades-off exploration vs exploitation. In that sense, RL encompasses a larger class of problems than just control theory, whereas control theory is specialized towards the exploitation part of the exploration vs exploitation spectrum.

[1] Though there are some learning controllers like ILCs (iterative learning control) and adaptive controllers which continually adapt to the environment. They have a weakness (perhaps RL suffers from the same) in that if a transient anomalous event comes through, they learn it and it messes up their subsequent behavior...

1 comments

I’m not sure how comparable adaptive control theory notions are to “reinforcement learning”. Adaptive obviously isn’t a perfectly defined word — but your usage makes me think you might be pondering applying RL to non-stationary environments which I’m not sure is something RL would currently be necessarily likely to perform well for - many reinforcement learning techniques _do_ require (or at least perform much better) when the environment is approximately stationary — of course it can be stochastic but the distributions should be mostly fixed or else convergence challenges are likely to be exacerbated.