|
|
|
|
|
by simonw
456 days ago
|
|
If you run Gemma via Ollama (as recommended in the Gemma docs) you get exactly that feature, because Ollama provides that for any model that they run for you: https://ollama.com/blog/structured-outputs Under the hood, it is using the llama.cpp grammars mechanism that restricts allowed logits at each step, similar to Outlines. |
|
- We can constrain the output of a JSON grammar (old school llama.cpp)
- We can format inputs to make sure it matches the model format.
- Both of these combined is what llama.cpp does, via @ochafik, in inter alia, https://github.com/ggml-org/llama.cpp/pull/9639.
- ollama isn't plugged into this system AFAIK
To OP's question, specifying a format in the model unlocks training the model specifically had on functions calling: what I sometimes call an "agentic loop", i.e. we're dramatically increasing the odds we're singing in the right tune for the model to do the right thing in this situation.