With all of its excitement surrounding RL, I am yet to see substantial practical applications of RL in real life apart from games, and some articles I read on how companies use RL for recommendation or ad suggestion.
So, it is indeed kind of puzzling to understand as an outsider what generates this excitement.
Reinforcement learning is a solid fit for a number of traditional operations research problems. (In fact, that's pretty much what motivated research into RL in the first place, as I understand it.)
One concrete example I heard about was using reinforcement learning to price airline tickets. Behind the scenes, airlines break up the tickets on a single plane into a large number of distinctly priced types. The question of how much of each type of ticket to offer at what price and how to change this over time (as the actual flight is coming closer and closer) is a massive optimization problem that's too large to solve exactly. Reinforcement learning coupled with simulation can find good solutions if you set up the feature space correctly. (In this case, I remember that the only feature that ultimately mattered was either total profit or total revenue for the mix of tickets being offered.)
One thing to note here is that this is using "normal" reinforcement learning, not "deep" reinforcement learning. You can get away with having a simple functional approximation of the state instead of reaching for a neural network. This seems true for most operations research problems where you would reach for reinforcement learning—figuring out a way to model the state by hand works well enough and has the important benefit of being easier to understand and interpret. The "deep" part becomes useful when your state space is so large and complex that other techniques become infeasible.
One of the core equations in reinforcement learning is the Bellman equation -- named after Richard Bellman, inventor of dynamic programming. And in fact, in the operations research community, reinforcement learning is often referred to as "approximate dynamic programming". There are lots of extremely boring, quite effective techniques for solving real industrial problems in this framework, without any neural networks at all.
As for why so much excitement about "deep RL", when it hasn't done anything substantive outside of games -- I think it's because it has the possibility of working in wildly different domains with minimal modification. We can sort of see some of this already -- OpenAI used the same training algorithm to play DOTA2 as they did to train a robotic hand to manipulate blocks.
I have no idea whether the hype will actually pan out, but that generality is worth being cautiously excited about, I think.
Researchers have used reinforcement learning techniques to train neural models with non differentiable loss functions.
For instance, a common practice is to:
1. train a translation model with standard cross entropy loss
2. fine tune the model with reinforcement learning against BLEU scores.
BLEU is generally the metric thats reported in translation papers, but it can't be used as an objective function since it isn't differentiable.
Reinforcement learning can be used to generate gradients from these types of objectives.
Game playing, recommendations and ad suggestions are all examples problems with non differentiable objectives, but there are many more types of these problems that I think we haven't explored as deeply, that can potentially be solved with RL.
- model free methods have seen great success in terms of learning high dimensional tasks however it suffers from being sample inefficient. In other words, it takes too long for real robots. Examples of these methods are TRPO, PPO, ES, etc
- model based methods is an order of magnitude more efficient, and thus, are more practical on real world robots. However, these methods have high bias and most working models are simple in terms of representation power, e.g. GP, time varying linear, mixture of Gaussians,. Examples are PILCO, GPS, PETS, etc
Of course, SOTA is a lot more complicated but it's a short explanation to your observation.
RL is equivalent to optimal control in many real-world systems like robotics, self-driving vehicles, and other complex systems (aircraft control, etc.). There are lots of practical applications for RL, but it doesn't always work well; solving even simple problems can often be deceptively complex—to the point of being intractable, unstable, or both.
Serious A/B testing is usually done with a bandit algo these days, you can converge with far fewer trials than a statistically meaningful A/B. News or other articles recommendations are often bandits too - MSN being a prime example.
Are the only resources you're referencing those by others, or do you plan to include projects/lessons you yourself have made?
There has been a rise lately in Machine Learning/Deep Learning resources which have zero original projects or original ideas, just a list of papers/blog posts (or worse, MOOC teachers/YouTubers who do that and obfuscate the source of the original ideas). While that's an educational option, it's, in my opinion, lazy and avoids furthering the ecosystem as a whole.
My goal is to put together, for each week, theoretical material done by experienced people (e.g. video,papers,books ecc..) with projects done by myself.
> There has been a rise lately in Machine Learning/Deep Learning resources which have zero original projects or original ideas
Nah, not lazy imho. On one side, the barrier to just messing around is pretty low these days but the barrier to original projects or ideas is quite steep without a strong domain expertise and a nicely assembled dataset; on the other side, the arXiv repository and the most relevant conferences are possibly more suited to originality than the average list by Noob McNobody from his/her basement in Nowhere, Planet Earth?
> While that's an educational option, it's, in my opinion, lazy and avoids furthering the ecosystem as a whole.
I've missed HN's cynical take on things lately. There's absolutely nothing wrong with using those ressources to learn. You say it's an educational "option" whereas the entire purpose is educational.
True, there's nothing wrong with using these resources, although I'm a bit disappointed when a "Awesome List of ML/DL" pops up every other week on HN with similar content/topics.
I am equally disappointed as well but I realized that my disappointment should be directed towards the people upvoting such lists and not towards whomever's creating them, because, on a personal level, such lists are useful. For the entire field however, I agree with you.
So, it is indeed kind of puzzling to understand as an outsider what generates this excitement.