Hacker News new | ask | show | jobs
by gh1 1483 days ago
My experience matches yours. Recently, I was trying to solve an optimization problem using Deep RL. As usual, I had to run many experiments over several days using various tricks and hyperparameters. Finally, it turned out something related to the symmetry of the action space made a huge difference in learning.

Anyhow, the experimentation stage requires a certain discipline and feels tedious at times. But the moment when learning takes off, it feels great, and for me personally, compensates for the tedious phase before.

It's certainly not fun for everyone, but I guess it could be fun for the target audience of the course (ML engineers/Data Scientists).

Regarding frameworks, my experience has been different. I find RLlib to be more modular and adaptable than SB3. But the learning curve is certainly steeper. The biggest differentiating factor for me is production readiness. Assuming that we are learning something in order to actually use it, I would recommend RLlib over SB3. The equation for researchers may be different though.

1 comments

Have you ever encountered a situation where RL solved a (IRL "people paid me non-research-grant money for this") problem for you faster than classical controls engineering and/or planning? I have not.
Depends on what you mean by faster. Do you mean "time to solution" or "time to inference"? I think there are also more factors to take into consideration when considering the merit of the method e.g. performance, robustness, ability to handle non-linearity, ability to solve the full online problem etc.

When all these factors are taken into account, I have encountered situations where Deep RL performed better.

There are also very public examples of this e.g. Google's data center cooling [0] and competitive sailing [1].

[0] https://www.technologyreview.com/2018/08/17/140987/google-ju... [1] https://www.mckinsey.com/business-functions/mckinsey-digital...

> Do you mean "time to solution" or "time to inference"?

I meant time to a real solution that works well enough to put into a product.

> There are also very public examples of this e.g. Google's data center cooling [0] and competitive sailing [1].

DeepMind really needed DRL wins on real problems.

McKinsy has a strong incentive to be able to say "we know all about the AI RL magic" (and all the better that it's in the context of an oligarchy's entry in a Rich Person Sport... such C-suite/investor class cred!)

In both cases, DRL was used because it was the right tool for the job. But, in both cases, proving DRL can be useful was the job! Go is a better example, but of course wasn't solving a real problem.

If you throw enough engineering time and compute at DRL, it can usually work well enough. (There is a real benefit to "just hack at it long enough" over "know the right bits of control theory".)