|
|
|
|
|
by refulgentis
449 days ago
|
|
I've been working on tool calling in llama.cpp for Phi-4 and have a client that can switch between local models and remote for agentic work/search/etc., I learned a lot about this situation recently: - We can constrain the output of a JSON grammar (old school llama.cpp) - We can format inputs to make sure it matches the model format. - Both of these combined is what llama.cpp does, via @ochafik, in inter alia, https://github.com/ggml-org/llama.cpp/pull/9639. - ollama isn't plugged into this system AFAIK To OP's question, specifying a format in the model unlocks training the model specifically had on functions calling: what I sometimes call an "agentic loop", i.e. we're dramatically increasing the odds we're singing in the right tune for the model to do the right thing in this situation. |
|