|
|
|
|
|
by refulgentis
460 days ago
|
|
The Ollama stuff is the old llama.cpp stuff that constrains output tokens. It's great, I've used it to get outputs from as small a model as 1B. But it's a stark difference in quality from, say, Phi-4's native tool-calling. If Gemma 3 is natively trained on tool-calling, i.e. y'all are benching on say, Berekley Function Calling leaderboard, that'd be great to know out here. Tangentially, github.com/ochafik is a Googler who landed an excellent overhaul of llama.cpp's tool-calling, might be worth reaching out to (if you're not working with him already!) |
|