| HN Mirror

Y	Hacker News new \| ask \| show \| jobs


	by parthsareen 564 days ago
	I authored the blog with some other contributors and worked on the feature (PR: https://github.com/ollama/ollama/pull/7900). The current implementation uses llama.cpp GBNF grammars. The more recent research (Outlines, XGrammar) points to potentially speeding up the sampling process through FSTs and GPU parallelism.

2 comments

mmoskal 563 days ago

If you want avoid startup cost, llguidance [0] has no compilation phase and by far the fullest JSON support [1] of any library. I did a PoC llama.cpp integration [2] though our focus is mostly server-side [3].

[0] https://github.com/guidance-ai/llguidance [1] https://github.com/guidance-ai/llguidance/blob/main/parser/s... [2] https://github.com/ggerganov/llama.cpp/pull/10224 [3] https://github.com/guidance-ai/llgtrt

link

HanClinto 563 days ago

I have been thinking about your PR regularly, and pondering about how we should go about getting this merged in.

I really want to see support for additional grammar engines merged into llama.cpp, and I'm a big fan of the work you did on this.

link

parthsareen 563 days ago

This looks really useful. Thank you!

link

netghost 564 days ago

Thank you for the details!

link