| HN Mirror

Y	Hacker News new \| ask \| show \| jobs


	by seertaak 560 days ago
	Could someone explain how this is implemented? I saw on Meta's Llama page that the model has intrinsic support for structured output. My 30k ft mental model of LLM is as a text completer, so it's not clear to me how this is accomplished. Are llama.cpp and ollama leveraging llama's intrinsic structured output capability, or is this something else bolted ex-post on the output? (And if the former, how is the capability guaranteed across other models?)

1 comments

presumably at each step they mask out all tokens that would be invalid at that step according to the grammar

That makes sense. Thanks