| HN Mirror

Y	Hacker News new \| ask \| show \| jobs


	by cyber_kinetist 1481 days ago
	To be honest though, the practical side of things of RL can be a hit-and-miss in terms of "fun" depending on the person. It requires a lot of manual hand tuning, reward shaping, hyperparameter tuning, and general trial-and-error to make an agent do a seemingly simple-enough task, and these tricks are more heuristically and haphazardly done than what you would expect from more "conventional" programming. It is fun for the right people (who loves tinkering with stuff and also have the perseverance to continually run RL experiments that can last hours or even days). But I would imagine many getting bored by the whole experience. (Pssst.... I was one of them, switched to doing something else in the middle of grad school) By the way, RLlib is good if you want to try out simple experiments with well-established RL algorithms, but it's really awful to use when you want to modify the algorithm even just a little bit. So it's not bad for beginner-level tutorials, but once you get the basics it might be very frustrating later on. I would recommend simpler frameworks like Stable Baselines 3 (https://stable-baselines3.readthedocs.io/en/master/ ) for a much more stable experience, if you have gained a fair bit of Python/ML programming skills at hand and don't have trouble reading well-maintained library code.

2 comments

avna98 1481 days ago

RLlib maintainer here -- We've been in the process of making many API changes over the past couple months to make it easy to modify or implement custom algorithms. The full set of changes and updated docs will be released along with ray 2.0 in August!

link

cyber_kinetist 1481 days ago

Ah, good to meet here. I had experience using RLlib while participating in research back at grad school (which eventually became a SIGGRAPH conference paper this year!), and I've even sent some small pull requests before (with a different ID). Sorry if this is a bit of an off-topic comment, but I want to share some inconveniences I've experienced during using RLlib:

- The framework seems to be mainly built on the assumption that it is going to be run on a cloud machine like AWS/Azure. However, many researchers use HPC-type cluster machines which are far different from these cloud setups, and I found support for it to be lackluster in RLlib. (In our case we had 4 16-core Xeon CPUs and 1 V100 GPU per node, with multiple nodes connected via Infiniband, with CentOS 7 / OpenHPC installed and job control done via SLURM) It was quite disappointing to found out that the framework didn't support Infiniband communication at all, since these are really costly to have (for good reason!). I also found that allocating workers based on lower-level details like affinity/NUMA to be very cumbersome, since the API assumes you want to "auto-assign" your workers automatically instead of "pinning" it manually for the highest performance. (The last time I've used RLlib I looked at placement groups to do this but found it too confusing.) Running your environments NUMA-aware can be crucial for having the best performance when you're running heavy custom-made environments in C++. I did some experiments and found out that parallelizing the environment on the C++ side (via threading) on each NUMA mode was much faster than blindly running one process per physical CPU core (which is what RLlib defaults to. You can hack a bit and write your VecEnv on the C++ side but this messes up lots of assumptions RLlib makes and creates a whole lot of other issues in the code.) Seeing promising solutions like Envpool (https://github.com/sail-sg/envpool) coming up I think these issues with parallelizing environments can be improved.

- As I've said before, the framework is very easy to do simple and established things, but becomes very hard when you try to do anything custom, like modifying RL algorithms to fit in your research. What I needed to do was to simply modify the PPO algorithm to do some custom learning step inside each epoch, and still found it surprisingly hard. Using the whole declarative "Observable-like" API approach to write RL code in Python was incredibly painful, since you have no way to debug any of your code, and also have no idea that your code is correct until you run your whole RL pipeline until 30 minutes into your training you get a strange TypeError. (Got some of the horror flashbacks from when I was using modern JS and Angular, but in a much worse form) I get the feeling that the overall codebase is incredibly complex, uses too many weird dark Python metaprogramming tricks, and is a pain to navigate and extend, compared to other much cleaner solutions like Stable Baselines 3... (they aren't as "general" of a solution as RLlib, but can be more easily modified towards one's needs). Maybe my needs were a bit special, so it might have been much better if I had hand-rolled my PPO implementation with torch.distributed... (if I just had more time...)

But still, your framework did help tremendously in our research, we wouldn't have finished the paper without it. These were just some lamentations from a formerly-grad school student who was struggling with these issues some years ago. (I'm not doing any reinforcement learning nowadays, but many people would certainly benefit from these improvements.)

link

gh1 1481 days ago

My experience matches yours. Recently, I was trying to solve an optimization problem using Deep RL. As usual, I had to run many experiments over several days using various tricks and hyperparameters. Finally, it turned out something related to the symmetry of the action space made a huge difference in learning.

Anyhow, the experimentation stage requires a certain discipline and feels tedious at times. But the moment when learning takes off, it feels great, and for me personally, compensates for the tedious phase before.

It's certainly not fun for everyone, but I guess it could be fun for the target audience of the course (ML engineers/Data Scientists).

Regarding frameworks, my experience has been different. I find RLlib to be more modular and adaptable than SB3. But the learning curve is certainly steeper. The biggest differentiating factor for me is production readiness. Assuming that we are learning something in order to actually use it, I would recommend RLlib over SB3. The equation for researchers may be different though.

link

InefficientRed 1481 days ago

Have you ever encountered a situation where RL solved a (IRL "people paid me non-research-grant money for this") problem for you faster than classical controls engineering and/or planning? I have not.

link

gh1 1481 days ago

Depends on what you mean by faster. Do you mean "time to solution" or "time to inference"? I think there are also more factors to take into consideration when considering the merit of the method e.g. performance, robustness, ability to handle non-linearity, ability to solve the full online problem etc.

When all these factors are taken into account, I have encountered situations where Deep RL performed better.

There are also very public examples of this e.g. Google's data center cooling [0] and competitive sailing [1].

[0] https://www.technologyreview.com/2018/08/17/140987/google-ju... [1] https://www.mckinsey.com/business-functions/mckinsey-digital...

link

InefficientRed 1481 days ago

> Do you mean "time to solution" or "time to inference"?

I meant time to a real solution that works well enough to put into a product.

> There are also very public examples of this e.g. Google's data center cooling [0] and competitive sailing [1].

DeepMind really needed DRL wins on real problems.

McKinsy has a strong incentive to be able to say "we know all about the AI RL magic" (and all the better that it's in the context of an oligarchy's entry in a Rich Person Sport... such C-suite/investor class cred!)

In both cases, DRL was used because it was the right tool for the job. But, in both cases, proving DRL can be useful was the job! Go is a better example, but of course wasn't solving a real problem.

If you throw enough engineering time and compute at DRL, it can usually work well enough. (There is a real benefit to "just hack at it long enough" over "know the right bits of control theory".)

link