| HN Mirror

Y	Hacker News new \| ask \| show \| jobs


	by viraptor 776 days ago
	In case you want to expand to more chapters one day: there's lots of tutorials of doing the simple things that has been verified to work, but if I'm struggling it's normally with something people barely ever mention - what to do when things go wrong. For example your actions just consistently get stuck at maximum. Or the exploration doesn't kick in, regardless how noisy you make the off-policy training. Or ... I wish there were more practical resources for when you've got the basics usually working, but suddenly get issues nobody really talks about. (beyond "just tweak some stuff until it works" anyway)

1 comments

alessiodm 776 days ago

Thanks a lot, and another great suggestion for improvement. I also found that the common advice is "tweak hyperparameters until you find the right combination". That can definitely help. But usually issues hide in different "corners", both of the problem space and its formulation, the algorithm itself (e.g., just different random seeds have big variance in performance), and more.

As you mentioned, in real applications of DRL things tend to go wrong more often than right: "it doesn't work just yet" [1]. And my short tutorial definitely lacks in the area of troubleshooting, tuning, and "productionisation". If I carve time for expansion, this will likely make top of list. Thanks again.

[1] https://www.alexirpan.com/2018/02/14/rl-hard.html

link

ubj 776 days ago

Thanks for sharing [1], that was a great read. I'd be curious to see an updated version of that article, since it's about 6 years old now. For example, Boston Dynamics has transitioned from MPC to RL for controlling its Spot robots [2]. Davide Scaramuzza, whose team created autonomous FPV drones that beat expert human pilots, has also discussed how his team had to transition from MPC to RL [3].

[2]: https://bostondynamics.com/blog/starting-on-the-right-foot-w...

[3]: https://www.incontrolpodcast.com/1632769/13775734-ep15-david...

link

alessiodm 776 days ago

Thank you for the amazing links as well! You are right that the article [1] is 6 years old now, and indeed the field has evolved. But the algorithms and techniques I share in the GitHub repo are the "classic" ones (dating back then too), for which that post is still relevant - at least from an historical perspective.

You bring up a very good point though: more recent advancements and assessments should be linked and/or mentioned in the repo (e.g., in the resources and/or an appendix). I will try to do that sometime.

link