|
|
|
|
|
by gyrovagueGeist
380 days ago
|
|
Does anyone know why they added minibatch advantage normalization (or when it can be useful)? The paper they cite "What matters in on-policy RL" claims it does not lead to much difference on their suite of test problems, and (mean-of-minibatch)-normalization doesn't seem theoretically motivated for convergence to the optimal policy? |
|