Hacker News new | ask | show | jobs
by YeGoblynQueenne 858 days ago
>> Usually I see some signs of learning, but it fails to come up with anything spectacular.

And even if it succeeds, it fails again as soon as you change the environment because RL doesn't generalise. At all. It's kind of shocking to be honest.

https://robertkirk.github.io/2022/01/17/generalisation-in-re...