|
|
|
|
|
by schattschneider
743 days ago
|
|
This article doesn't describe it in detail. One scenario imaginable would be that they ran their model trained on non-full moon data for an evaluation on a full moon day. Which means the model would simply apply it's learned "optimal" action policy in a different environment, where the previously learned action policy doesn't lead to good scores anymore. |
|