Hacker News new | ask | show | jobs
by laquition 2029 days ago
Stagnation is a known problem in reinforcement learning and similar methods. It's very easy to get stuck at a local maximum. My favorite fun example is https://gym.openai.com/envs/BipedalWalkerHardcore-v2/ where a standard DDPG(https://arxiv.org/abs/1509.02971) will get stuck at pits in the environment. Although it could get a higher score if it learned to jump, there is a penalty with falling in that makes it stabilize on standing still and running out the timer. Video: https://www.youtube.com/watch?v=DEGwhjEUFoI

I suspect there is something similar going on with video/music recommendations. When a bad novel suggestion is made the penalty is likely too high to overcome (User immediately clicks off) with traditional reinforcement methods.