| HN Mirror

Y	Hacker News new \| ask \| show \| jobs


	by gyrovagueGeist 380 days ago
	Does anyone know why they added minibatch advantage normalization (or when it can be useful)? The paper they cite "What matters in on-policy RL" claims it does not lead to much difference on their suite of test problems, and (mean-of-minibatch)-normalization doesn't seem theoretically motivated for convergence to the optimal policy?

1 comments

Tbh I'm unsure as well I took a skim of the paper so if I find anything I'll post it here!