Hacker News new | ask | show | jobs
by RSchaeffer 2644 days ago
This is going to sound cynical, but I recently invested a week in rllib for a project before discovering that much of the under-the-hood implementation was horribly confusing, poorly documented and missing critical functionality (for instance, their IMPALA implementation only works with discrete action spaces). Does this library conceal similar problems?
3 comments

Our IMPALA also currently only supports discrete action spaces because we wanted to exactly replicate Deepmind's implementation for benchmarking. In your case I'd suggest looking into our SAC implementation, which learned typical continuous action benchmarks (e.g. Pendulum-v0) in a few dozen episodes.

Regarding code quality and ease of use, we follow a strict modular approach with separate components that can be tested individually. Component dataflow is defined on an abstract level, which makes it rather easy to create new components and algorithms. So, instead of having to adjust complex code structures with lots of intertwined behavior, you usually can just plug in another component that covers your use case.

Regarding our IMPALA implementation: It currently also only supports discrete actions. However, our SAC algo is extremely strong. It learned (continuous) Pendulum-v0 within only a few dozen episodes, so you could try that one instead. As for ease of use: We believe our code is quite user friendly (take a look at our example scripts and configs) and also well extendable due to the strictly enforced modularity of our components and our abstract data flow definitions inside an algorithm.
I also found extremely hard to understand and extend the under-the-hood implementation. Couldn't grasp how the separation of concerns was split between the different classes. The documentation is lacking examples on how one of the algorithms (e.g. IMPALA, SAC, PPO) was built from scratch.
Can you recommend a better alternative to RLLib that you have experience with?