Hacker News new | ask | show | jobs
by parthsareen 557 days ago
We’ve been keeping a close eye on this as well as research is coming out. We’re looking into improving sampling as a whole on both speed and accuracy.

Hopefully with those changes we might also enable general structure generation not only limited to JSON.

1 comments

Who is "we"?
I authored the blog with some other contributors and worked on the feature (PR: https://github.com/ollama/ollama/pull/7900).

The current implementation uses llama.cpp GBNF grammars. The more recent research (Outlines, XGrammar) points to potentially speeding up the sampling process through FSTs and GPU parallelism.

If you want avoid startup cost, llguidance [0] has no compilation phase and by far the fullest JSON support [1] of any library. I did a PoC llama.cpp integration [2] though our focus is mostly server-side [3].

[0] https://github.com/guidance-ai/llguidance [1] https://github.com/guidance-ai/llguidance/blob/main/parser/s... [2] https://github.com/ggerganov/llama.cpp/pull/10224 [3] https://github.com/guidance-ai/llgtrt

I have been thinking about your PR regularly, and pondering about how we should go about getting this merged in.

I really want to see support for additional grammar engines merged into llama.cpp, and I'm a big fan of the work you did on this.

This looks really useful. Thank you!
Thank you for the details!