|
|
|
|
|
by danielhanchen
490 days ago
|
|
Yes tool calling is a prime example!! Ie you have some specific task, and the final output involving some tools, but sadly the steps to call the tools / the stuff in between / the thinking process is missing. You can employ GRPO and maybe add an actual Python environment for the model to learn to act in. |
|