| HN Mirror

Y	Hacker News new \| ask \| show \| jobs

by alessiodm 817 days ago

Thanks a lot, and another great suggestion for improvement. I also found that the common advice is "tweak hyperparameters until you find the right combination". That can definitely help. But usually issues hide in different "corners", both of the problem space and its formulation, the algorithm itself (e.g., just different random seeds have big variance in performance), and more.

As you mentioned, in real applications of DRL things tend to go wrong more often than right: "it doesn't work just yet" [1]. And my short tutorial definitely lacks in the area of troubleshooting, tuning, and "productionisation". If I carve time for expansion, this will likely make top of list. Thanks again.

[1] https://www.alexirpan.com/2018/02/14/rl-hard.html

1 comments

ubj 817 days ago

Thanks for sharing [1], that was a great read. I'd be curious to see an updated version of that article, since it's about 6 years old now. For example, Boston Dynamics has transitioned from MPC to RL for controlling its Spot robots [2]. Davide Scaramuzza, whose team created autonomous FPV drones that beat expert human pilots, has also discussed how his team had to transition from MPC to RL [3].

[2]: https://bostondynamics.com/blog/starting-on-the-right-foot-w...

[3]: https://www.incontrolpodcast.com/1632769/13775734-ep15-david...

link

alessiodm 817 days ago

Thank you for the amazing links as well! You are right that the article [1] is 6 years old now, and indeed the field has evolved. But the algorithms and techniques I share in the GitHub repo are the "classic" ones (dating back then too), for which that post is still relevant - at least from an historical perspective.

You bring up a very good point though: more recent advancements and assessments should be linked and/or mentioned in the repo (e.g., in the resources and/or an appendix). I will try to do that sometime.

link