| HN Mirror

LMQL author here, thanks for commenting. prlang looks really cool, I will have a closer look. LLMs+PLs is a very interesting field right now, lots of directions to explore.

LMQL`s efficiency gains can be attributed to close supervision of the generation process, as the token masking via constraints is directly integrated into the decoding loop and happens on the token level. Compared to text-based high-level APIs, this means you can save a bunch of useless continuations the LM will produce, that further down the pipeline you have to discard, as you may want to enforce constraints, insert some follow-up instruction, or tool execution result.