If by that you mean reinforcement learning, that's not the case; e.g. see https://arxiv.org/abs/2501.12948