|
|
|
|
|
by parthsareen
564 days ago
|
|
I authored the blog with some other contributors and worked on the feature (PR: https://github.com/ollama/ollama/pull/7900). The current implementation uses llama.cpp GBNF grammars. The more recent research (Outlines, XGrammar) points to potentially speeding up the sampling process through FSTs and GPU parallelism. |
|
[0] https://github.com/guidance-ai/llguidance [1] https://github.com/guidance-ai/llguidance/blob/main/parser/s... [2] https://github.com/ggerganov/llama.cpp/pull/10224 [3] https://github.com/guidance-ai/llgtrt