Deep Reinforcement Learning in Depth in 60 Days

Y	Hacker News new \| ask \| show \| jobs

	Deep Reinforcement Learning in Depth in 60 Days (github.com)
	189 points by andri27 2853 days ago

4 comments

nafizh 2853 days ago

With all of its excitement surrounding RL, I am yet to see substantial practical applications of RL in real life apart from games, and some articles I read on how companies use RL for recommendation or ad suggestion.

So, it is indeed kind of puzzling to understand as an outsider what generates this excitement.

link

tikhonj 2853 days ago

Reinforcement learning is a solid fit for a number of traditional operations research problems. (In fact, that's pretty much what motivated research into RL in the first place, as I understand it.)

One concrete example I heard about was using reinforcement learning to price airline tickets. Behind the scenes, airlines break up the tickets on a single plane into a large number of distinctly priced types. The question of how much of each type of ticket to offer at what price and how to change this over time (as the actual flight is coming closer and closer) is a massive optimization problem that's too large to solve exactly. Reinforcement learning coupled with simulation can find good solutions if you set up the feature space correctly. (In this case, I remember that the only feature that ultimately mattered was either total profit or total revenue for the mix of tickets being offered.)

One thing to note here is that this is using "normal" reinforcement learning, not "deep" reinforcement learning. You can get away with having a simple functional approximation of the state instead of reaching for a neural network. This seems true for most operations research problems where you would reach for reinforcement learning—figuring out a way to model the state by hand works well enough and has the important benefit of being easier to understand and interpret. The "deep" part becomes useful when your state space is so large and complex that other techniques become infeasible.

link

currymj 2853 days ago

One of the core equations in reinforcement learning is the Bellman equation -- named after Richard Bellman, inventor of dynamic programming. And in fact, in the operations research community, reinforcement learning is often referred to as "approximate dynamic programming". There are lots of extremely boring, quite effective techniques for solving real industrial problems in this framework, without any neural networks at all.

As for why so much excitement about "deep RL", when it hasn't done anything substantive outside of games -- I think it's because it has the possibility of working in wildly different domains with minimal modification. We can sort of see some of this already -- OpenAI used the same training algorithm to play DOTA2 as they did to train a robotic hand to manipulate blocks.

I have no idea whether the hype will actually pan out, but that generality is worth being cautiously excited about, I think.

link

czhu12 2853 days ago

Researchers have used reinforcement learning techniques to train neural models with non differentiable loss functions.

For instance, a common practice is to: 1. train a translation model with standard cross entropy loss 2. fine tune the model with reinforcement learning against BLEU scores.

BLEU is generally the metric thats reported in translation papers, but it can't be used as an objective function since it isn't differentiable.

Reinforcement learning can be used to generate gradients from these types of objectives.

Game playing, recommendations and ad suggestions are all examples problems with non differentiable objectives, but there are many more types of these problems that I think we haven't explored as deeply, that can potentially be solved with RL.

link

inverse_pi 2853 days ago

Here's a two sentence summary of SOTA:

- model free methods have seen great success in terms of learning high dimensional tasks however it suffers from being sample inefficient. In other words, it takes too long for real robots. Examples of these methods are TRPO, PPO, ES, etc

- model based methods is an order of magnitude more efficient, and thus, are more practical on real world robots. However, these methods have high bias and most working models are simple in terms of representation power, e.g. GP, time varying linear, mixture of Gaussians,. Examples are PILCO, GPS, PETS, etc

Of course, SOTA is a lot more complicated but it's a short explanation to your observation.

link

cgearhart 2853 days ago

RL is equivalent to optimal control in many real-world systems like robotics, self-driving vehicles, and other complex systems (aircraft control, etc.). There are lots of practical applications for RL, but it doesn't always work well; solving even simple problems can often be deceptively complex—to the point of being intractable, unstable, or both.

link

bojanbabic 2853 days ago

to my knowledge, not a single self-driving car company is using RL in production. it is a more theoretic methodology than anything else.

link

gaius 2853 days ago

Serious A/B testing is usually done with a bandit algo these days, you can converge with far fewer trials than a statistically meaningful A/B. News or other articles recommendations are often bandits too - MSN being a prime example.

link

kleiba 2853 days ago

Spoken dialog systems.

Check out Steve Young's group at Cambridge, it's considered the state of the art (well, at least by some ;-)).

link

minimaxir 2853 days ago

Are the only resources you're referencing those by others, or do you plan to include projects/lessons you yourself have made?

There has been a rise lately in Machine Learning/Deep Learning resources which have zero original projects or original ideas, just a list of papers/blog posts (or worse, MOOC teachers/YouTubers who do that and obfuscate the source of the original ideas). While that's an educational option, it's, in my opinion, lazy and avoids furthering the ecosystem as a whole.

link

andri27 2853 days ago

My goal is to put together, for each week, theoretical material done by experienced people (e.g. video,papers,books ecc..) with projects done by myself.

link

DrNuke 2853 days ago

> There has been a rise lately in Machine Learning/Deep Learning resources which have zero original projects or original ideas

Nah, not lazy imho. On one side, the barrier to just messing around is pretty low these days but the barrier to original projects or ideas is quite steep without a strong domain expertise and a nicely assembled dataset; on the other side, the arXiv repository and the most relevant conferences are possibly more suited to originality than the average list by Noob McNobody from his/her basement in Nowhere, Planet Earth?

link

curiousgal 2853 days ago

> While that's an educational option, it's, in my opinion, lazy and avoids furthering the ecosystem as a whole.

I've missed HN's cynical take on things lately. There's absolutely nothing wrong with using those ressources to learn. You say it's an educational "option" whereas the entire purpose is educational.

link

minimaxir 2853 days ago

True, there's nothing wrong with using these resources, although I'm a bit disappointed when a "Awesome List of ML/DL" pops up every other week on HN with similar content/topics.

I apologize for being overly cynical.

link

curiousgal 2853 days ago

I am equally disappointed as well but I realized that my disappointment should be directed towards the people upvoting such lists and not towards whomever's creating them, because, on a personal level, such lists are useful. For the entire field however, I agree with you.

link

lkhatter 2853 days ago

I’ll do this, please keep it updated as the weeks go by!

link

andri27 2853 days ago

Great! Yes, sure

link

gdsdfe 2853 days ago

Yeah I might follow along too, I need some motivation ...

link

guard0g 2853 days ago

DRL in financial derivative pricing, risk modeling, HFT. Check out Igor Halperin on Coursera.

link