Hacker News new | ask | show | jobs
by regen7253 734 days ago
Thank you!

I've been using llama.cpp for about a year now, mostly implementing some RAG and React related papers to stay up to date. I mostly used llama.cpp, but since a few months, I started to use both Ollama and Llama.cpp.

If you added grammars I wouldn't have to be running the two servers, I think you're doing an excellent job out of maintaining Ollama. Every update is like Christmas. They also don't seem to have the server as a priority (it's still literally just an example of how you'd use their C api).

So, I understand your position, since their server API has been quite unstable, and the grammar validation didn't work at all until February. I also still can't get their multiple model loading to work reliably right now.

Having said that, GBNF is a godsend for my daily use cases. I even prefer using phi3b with a grammar than deal with the hallucinations of a 70b without it. Fine tuning helps a lot, but can't solve the problem fully (you still need to validate the generation), and it's a lot less agile when implementing ideas. Crating some synthetic data sets is easier if you have support for grammars.

I think many like me are in the same spot. Thank you for being considerate about the stability and support that it would require. But please, take a look at the current state of their grammar validation, which is pretty good right now.