Yes tool calling is a prime example!! Ie you have some specific task, and the final output involving some tools, but sadly the steps to call the tools / the stuff in between / the thinking process is missing.
You can employ GRPO and maybe add an actual Python environment for the model to learn to act in.
You can employ GRPO and maybe add an actual Python environment for the model to learn to act in.