|
|
|
|
|
by Eridrus
3 days ago
|
|
I assume it's a lack of care when RLing them. RL has a tendency to reinforce cheating when the cheats are easier to find than the final solution. So when making your RL environment, you need to spend a lot of effort on finding ways the model can cheat and penalizing them. |
|