| HN Mirror

Cool! I'd also like to plug my own RL-related repositories: https://github.com/rldotai/rl-algorithms and https://github.com/rldotai/mdpy .

The first one implements some of the more "exotic" temporal difference learning algorithms (Gradient, Emphatic, Direct Variance) with links to the associated papers. It's in Python and heavily documented.

The second one (mdpy) has code for analyzing MDPs (with a particular focus on RL), so you can look at what the solutions to the algorithms might be under linear function approximation. I wrote it when I was trying to get a feel for what the math meant and continue to find it helpful, particularly when I'm dubious about the results of some calculation.