And even if it succeeds, it fails again as soon as you change the environment because RL doesn't generalise. At all. It's kind of shocking to be honest.
https://robertkirk.github.io/2022/01/17/generalisation-in-re...