Hacker News new | ask | show | jobs
by refulgentis 449 days ago
I've been working on tool calling in llama.cpp for Phi-4 and have a client that can switch between local models and remote for agentic work/search/etc., I learned a lot about this situation recently:

- We can constrain the output of a JSON grammar (old school llama.cpp)

- We can format inputs to make sure it matches the model format.

- Both of these combined is what llama.cpp does, via @ochafik, in inter alia, https://github.com/ggml-org/llama.cpp/pull/9639.

- ollama isn't plugged into this system AFAIK

To OP's question, specifying a format in the model unlocks training the model specifically had on functions calling: what I sometimes call an "agentic loop", i.e. we're dramatically increasing the odds we're singing in the right tune for the model to do the right thing in this situation.

1 comments

Do you have thoughts on the code-style agents recommended by huggingface? The pitch for them is compelling, since structuring complex tasks in code is something very natural for LLMs. But then, I don’t see as much about this approach outside of HF.