Hacker News new | ask | show | jobs
by codesourcerer 2655 days ago
Regarding our IMPALA implementation: It currently also only supports discrete actions. However, our SAC algo is extremely strong. It learned (continuous) Pendulum-v0 within only a few dozen episodes, so you could try that one instead. As for ease of use: We believe our code is quite user friendly (take a look at our example scripts and configs) and also well extendable due to the strictly enforced modularity of our components and our abstract data flow definitions inside an algorithm.
1 comments

I also found extremely hard to understand and extend the under-the-hood implementation. Couldn't grasp how the separation of concerns was split between the different classes. The documentation is lacking examples on how one of the algorithms (e.g. IMPALA, SAC, PPO) was built from scratch.