|
|
|
|
|
by jmorgan
729 days ago
|
|
Sorry it's taking so long to review and for the radio silence on the PR. We have been trying to figure out how to support more structured output formats without some of the side effects of grammars. With JSON mode (which uses grammars under the hood) there were originally quite a few issue reports namely around lower performance and cases where the model would infinitely generate whitespace causing requests to hang. This is an issue with OpenAI's JSON mode as well which requires the caller to "instruct the model to produce JSON" [1]. While it's possible to handle edge cases for a single grammar such as JSON (i.e. check for 'JSON' in the prompt), it's hard to generalize this to any format. Supporting more structured output formats is definitely important. Fine-tuning for output formats is promising, and this thread [2] also has some great ideas and links. [1] https://platform.openai.com/docs/guides/text-generation/json... [2] https://github.com/ggerganov/llama.cpp/issues/4218 |
|
I've been using llama.cpp for about a year now, mostly implementing some RAG and React related papers to stay up to date. I mostly used llama.cpp, but since a few months, I started to use both Ollama and Llama.cpp.
If you added grammars I wouldn't have to be running the two servers, I think you're doing an excellent job out of maintaining Ollama. Every update is like Christmas. They also don't seem to have the server as a priority (it's still literally just an example of how you'd use their C api).
So, I understand your position, since their server API has been quite unstable, and the grammar validation didn't work at all until February. I also still can't get their multiple model loading to work reliably right now.
Having said that, GBNF is a godsend for my daily use cases. I even prefer using phi3b with a grammar than deal with the hallucinations of a 70b without it. Fine tuning helps a lot, but can't solve the problem fully (you still need to validate the generation), and it's a lot less agile when implementing ideas. Crating some synthetic data sets is easier if you have support for grammars.
I think many like me are in the same spot. Thank you for being considerate about the stability and support that it would require. But please, take a look at the current state of their grammar validation, which is pretty good right now.