Hacker News new | ask | show | jobs
by samwillis 1134 days ago
On the topic of JSON output from these models, someone has added context free grammars to llama.cpp. This enforces that the output matches the grammar, effectively zeroing the probability of the next token not conforming to it.

https://twitter.com/GrantSlatton/status/1657559506069463040

https://github.com/grantslatton/llama.cpp/commit/007e26a99d4...

It's so obvious, it's genius.

1 comments

It's genius, but it's also solving the "easy" problem of checking syntax. If you ask an LLM to generate some structured data representing something you describe (or a program that "does X") checking the result for valid syntax is just the first step. You then need to check for semantic validity; i.e., is it what you want?
Oh yes, but it's nice that this technique enforces the grammar during generation, one token at a time, rather than having to check after completing the query and rerun for adjustments.
You're right, but an LLM is already trying to make sense (i.e. predict well) within the constraints given. So if you constrain the syntax, it's trying to fill it with the correct semantics. Doesn't always manage it, but it's trying.

This is similar to the way where if you ask a question in a given language, it responds in that language. But it still follows the instructions (hidden prompt) that was given to it only in English.

I.e. an LLM is essentially about finding an intersection of requirements in order to predict output.