|
|
|
|
|
by svachalek
472 days ago
|
|
Yeah the state of the art is pretty awful. There have been multiple incidents where a model has been dropped on ollama with the wrong chat template, resulting in it seeming to work but with greatly degraded performance. And I think it's always been a user that notices, not the ollama team or the model team. |
|
I make a llama.cpp wrapper myself, and it's somewhat frustrating putting effort in for everything from big obvious UX things, like error'ing when the context is too small for your input instead of just making you think the model is crap, to long-haul engineering commitments, like integrating new models with llama.cpp's new tool calling infra, and testing them to make sure it, well, actually works.
I keep telling myself that this sort of effort pays off a year or two down the road, once all that differentiation in effort day-to-day adds up. I hope :/