|
Ah, good to meet here. I had experience using RLlib while participating in research back at grad school (which eventually became a SIGGRAPH conference paper this year!), and I've even sent some small pull requests before (with a different ID). Sorry if this is a bit of an off-topic comment, but I want to share some inconveniences I've experienced during using RLlib: - The framework seems to be mainly built on the assumption that it is going to be run on a cloud machine like AWS/Azure. However, many researchers use HPC-type cluster machines which are far different from these cloud setups, and I found support for it to be lackluster in RLlib. (In our case we had 4 16-core Xeon CPUs and 1 V100 GPU per node, with multiple nodes connected via Infiniband, with CentOS 7 / OpenHPC installed and job control done via SLURM) It was quite disappointing to found out that the framework didn't support Infiniband communication at all, since these are really costly to have (for good reason!). I also found that allocating workers based on lower-level details like affinity/NUMA to be very cumbersome, since the API assumes you want to "auto-assign" your workers automatically instead of "pinning" it manually for the highest performance. (The last time I've used RLlib I looked at placement groups to do this but found it too confusing.) Running your environments NUMA-aware can be crucial for having the best performance when you're running heavy custom-made environments in C++. I did some experiments and found out that parallelizing the environment on the C++ side (via threading) on each NUMA mode was much faster than blindly running one process per physical CPU core (which is what RLlib defaults to. You can hack a bit and write your VecEnv on the C++ side but this messes up lots of assumptions RLlib makes and creates a whole lot of other issues in the code.) Seeing promising solutions like Envpool (https://github.com/sail-sg/envpool) coming up I think these issues with parallelizing environments can be improved. - As I've said before, the framework is very easy to do simple and established things, but becomes very hard when you try to do anything custom, like modifying RL algorithms to fit in your research. What I needed to do was to simply modify the PPO algorithm to do some custom learning step inside each epoch, and still found it surprisingly hard. Using the whole declarative "Observable-like" API approach to write RL code in Python was incredibly painful, since you have no way to debug any of your code, and also have no idea that your code is correct until you run your whole RL pipeline until 30 minutes into your training you get a strange TypeError. (Got some of the horror flashbacks from when I was using modern JS and Angular, but in a much worse form) I get the feeling that the overall codebase is incredibly complex, uses too many weird dark Python metaprogramming tricks, and is a pain to navigate and extend, compared to other much cleaner solutions like Stable Baselines 3... (they aren't as "general" of a solution as RLlib, but can be more easily modified towards one's needs). Maybe my needs were a bit special, so it might have been much better if I had hand-rolled my PPO implementation with torch.distributed... (if I just had more time...) But still, your framework did help tremendously in our research, we wouldn't have finished the paper without it. These were just some lamentations from a formerly-grad school student who was struggling with these issues some years ago. (I'm not doing any reinforcement learning nowadays, but many people would certainly benefit from these improvements.) |