Hacker News new | ask | show | jobs
by seertaak 560 days ago
Could someone explain how this is implemented? I saw on Meta's Llama page that the model has intrinsic support for structured output. My 30k ft mental model of LLM is as a text completer, so it's not clear to me how this is accomplished.

Are llama.cpp and ollama leveraging llama's intrinsic structured output capability, or is this something else bolted ex-post on the output? (And if the former, how is the capability guaranteed across other models?)

1 comments

presumably at each step they mask out all tokens that would be invalid at that step according to the grammar
That makes sense. Thanks