| HN Mirror

Y	Hacker News new \| ask \| show \| jobs

by tuchsen 1169 days ago

Not associated with this project (or LMQL), but one of the authors of LMQL, a similar project, answered this in a recent thread about it.

https://news.ycombinator.com/item?id=35484673#35491123

        As a solution to this, we implement speculative execution, allowing us to
        lazily validate constraints against the generated output, while still
        failing early if necessary. This means, we don't re-query the API for
        each token (very expensive), but rather can do it in segments of
        continuous token streams, and backtrack where necessary

Basically they use OpenAI's streaming API, then validate continuously that they're getting the appropriate output, retrying only if they get an error. It's a really clever solution.

1 comments

newhouseb 1169 days ago

This is slick -- It's not explicitly documented anywhere but I hope OpenAI has the necessary callbacks to terminate generation when the API stream is killed rather than continuing in the background until another termination condition happens? I suppose one could check this via looking at API usage when a stream is killed early.

link

tuchsen 1169 days ago

Yeah I did a CLI tool for talking to ChatGPT. I'm pretty sure they stop generating when you kill the SSE stream, based on my anecdotal experience of keeping ChatGPT4 costs down by killing it as soon as i get the answer I'm looking for. You're right that it's undocumented behavior though, on a whole the API docs they give you are as thin as the API itself.

link

killthebuddha 1169 days ago

I'm skeptical that the streaming API would really save that much cost. In my experience the vast majority of all tokens used are input tokens rather than completed tokens.

link

boywitharupee 1167 days ago

Any new call to the API is considered fresh. I don't believe your session is saved.

link

newhouseb 1166 days ago

We're talking about the streaming API which streams generated text token by token, not the normal one-shot API. I have no insider knowledge but would agree with your intuition on the normal API.

link