Hacker News new | ask | show | jobs
by lbeurerkellner 1164 days ago
(Another LMQL author here)

Cost is definitely a dimension we are considering (research has limited funding after all :) ), especially with the OpenAI API. Lock-step token-level control is difficult to implement with the very limited OpenAI API. As a solution to this, we implement speculative execution, allowing us to lazily validate constraints against the generated output, while still failing early if necessary. This means, we don't re-query the API for each token (very expensive), but rather can do it in segments of continuous token streams, and backtrack where necessary,

This is still more expensive than doing all in one request, but it is an inherent limitation of the OpenAI API, and not LMQL. On the upside, you gain more control, scripting and constraints, even with OpenAI models.

Ideally, some program representation of a scripted prompt like LMQL queries could be send over to the inference service, and be executed locally with full model access. This way, model vendors would not have to expose their models fully (e.g. to protect against distillation), but API users would gain a lot more control and efficiency. Alternatively, of course, better open source models with full access to logits are the ultimate solution, which is also the context in which LMQL was initially conceived in.

1 comments

Yeah, that's a good point, maybe you have a chance to establish a sort of standard here? I guess an API isn't so easily remoted whereas a language you can just upload for local execution would be a good fit. Great work anyway!