|
|
|
|
|
by tuchsen
1122 days ago
|
|
Not associated with this project (or LMQL), but one of the authors of LMQL, a similar project, answered this in a recent thread about it. https://news.ycombinator.com/item?id=35484673#35491123 As a solution to this, we implement speculative execution, allowing us to
lazily validate constraints against the generated output, while still
failing early if necessary. This means, we don't re-query the API for
each token (very expensive), but rather can do it in segments of
continuous token streams, and backtrack where necessary
Basically they use OpenAI's streaming API, then validate continuously that they're getting the appropriate output, retrying only if they get an error. It's a really clever solution. |
|