a cool idea, except that battling actually doesn't even matter to the ai. if you look at what the agent is doing during a battle, it is sort of spamming options + picking damaging attacks. it would be a stretch to say that agents were 'good' at battling...
as it stands, battling is wholly unimportant to completing the game, as long as the agents can eventually complete the trainer battles mandatory for plot advancement. it's funny because everyone thinks about battling when they think about pokemon. my first fn i wrote, back when we were still bumping around pallet town, was a battle reward function. it was trash and didn't work and was over-complicated. the crux of the problem is exploration over a vast, open-world map, and completion of the sundry storyline tasks at distal parts of said map in the correct sequence without the policy collapsing and without agents overfitting to, say, overworld loops.
I know all about rl. Ive read go-explore 1/2, and I have personally implemented intrinsic curiosity.
I was just commenting on what rhe other person said, which is that it would be cool to have the npcs be agents that battle and train too, to which you said they could not be made to, to which I say, we have the technology. :)