Hacker News new | ask | show | jobs
by sorenjan 1158 days ago
When I learned about RL we were taught to disable exploration when doing evaluation of the model since exploration part is stochastic. I don't think that would work in production.