| HN Mirror

Y	Hacker News new \| ask \| show \| jobs


	by parthsareen 557 days ago
	We’ve been keeping a close eye on this as well as research is coming out. We’re looking into improving sampling as a whole on both speed and accuracy. Hopefully with those changes we might also enable general structure generation not only limited to JSON.

1 comments

hackernewds 557 days ago

Who is "we"?

link

parthsareen 557 days ago

I authored the blog with some other contributors and worked on the feature (PR: https://github.com/ollama/ollama/pull/7900).

The current implementation uses llama.cpp GBNF grammars. The more recent research (Outlines, XGrammar) points to potentially speeding up the sampling process through FSTs and GPU parallelism.

link

mmoskal 557 days ago

If you want avoid startup cost, llguidance [0] has no compilation phase and by far the fullest JSON support [1] of any library. I did a PoC llama.cpp integration [2] though our focus is mostly server-side [3].

[0] https://github.com/guidance-ai/llguidance [1] https://github.com/guidance-ai/llguidance/blob/main/parser/s... [2] https://github.com/ggerganov/llama.cpp/pull/10224 [3] https://github.com/guidance-ai/llgtrt

link

HanClinto 556 days ago

I have been thinking about your PR regularly, and pondering about how we should go about getting this merged in.

I really want to see support for additional grammar engines merged into llama.cpp, and I'm a big fan of the work you did on this.

link

parthsareen 557 days ago

This looks really useful. Thank you!

link

netghost 557 days ago

Thank you for the details!

link