Hacker News new | ask | show | jobs
by gyrovagueGeist 380 days ago
Does anyone know why they added minibatch advantage normalization (or when it can be useful)?

The paper they cite "What matters in on-policy RL" claims it does not lead to much difference on their suite of test problems, and (mean-of-minibatch)-normalization doesn't seem theoretically motivated for convergence to the optimal policy?

1 comments

Tbh I'm unsure as well I took a skim of the paper so if I find anything I'll post it here!