Hacker News new | ask | show | jobs
by k_f 2651 days ago
Our IMPALA also currently only supports discrete action spaces because we wanted to exactly replicate Deepmind's implementation for benchmarking. In your case I'd suggest looking into our SAC implementation, which learned typical continuous action benchmarks (e.g. Pendulum-v0) in a few dozen episodes.

Regarding code quality and ease of use, we follow a strict modular approach with separate components that can be tested individually. Component dataflow is defined on an abstract level, which makes it rather easy to create new components and algorithms. So, instead of having to adjust complex code structures with lots of intertwined behavior, you usually can just plug in another component that covers your use case.