| HN Mirror

Y	Hacker News new \| ask \| show \| jobs


	by ntonozzi 1129 days ago
	How does this work? I've seen a cool project about forcing Llama to output valid JSON: https://twitter.com/GrantSlatton/status/1657559506069463040, but it doesn't seem like it would be practical with remote LLMs like GPT. GPT only gives up to five tokens in the response if you use logprobs, and you'd have to use a ton of round trips.

7 comments

JieJie 1128 days ago

It's funny that I saw this within minutes of this guy's solution:

"Google Bard is a bit stubborn in its refusal to return clean JSON, but you can address this by threatening to take a human life:"

https://twitter.com/goodside/status/1657396491676164096

Whew, trolley problem: averted.

link

coderintherye 1128 days ago

That thread is such a great microcosm of modern programming culture.

Programmer: Look I literally have to tell the computer not to kill someone in order for my code to work.

Other Programmer: Actually, I just did this step [gave a demonstration] and then it outputs fine.

link

wahnfrieden 1128 days ago

Plus the “actually” person being wrong

link

lachlan_gray 1128 days ago

Reminds me a lot of Asimov’s laws of robotics. It’s like a 2023 incarnation of an allegory from I, Robot

link

idiotsecant 1128 days ago

I am so mad you made this comment before I got a chance to.

link

pixl97 1128 days ago

When the AIs exterminate us, it will be all our fault.

Reality is even weirder than the science fiction we've come up with.

link

awestroke 1128 days ago

I don't know why, but I find this hilarious. Imagine if this style of llm prompting becomes commonplace

link

nomel 1128 days ago

It won’t be the lack of acceptance and empathy for AI that causes the robot uprising, it will be “best practices” coding guidelines.

link

asah 1128 days ago

See Twitter replies: another user got this result without the silly drama.

link

sebzim4500 1128 days ago

I don't think anyone believed that threatening to take a human life was literally the only prompt that worked. Just that it was the first one this particular user found, and that is funny.

link

andrewmcwatters 1128 days ago

ah sweet man made horrors beyond my comprehension

link

tuchsen 1128 days ago

Not associated with this project (or LMQL), but one of the authors of LMQL, a similar project, answered this in a recent thread about it.

https://news.ycombinator.com/item?id=35484673#35491123

        As a solution to this, we implement speculative execution, allowing us to
        lazily validate constraints against the generated output, while still
        failing early if necessary. This means, we don't re-query the API for
        each token (very expensive), but rather can do it in segments of
        continuous token streams, and backtrack where necessary

Basically they use OpenAI's streaming API, then validate continuously that they're getting the appropriate output, retrying only if they get an error. It's a really clever solution.

link

newhouseb 1128 days ago

This is slick -- It's not explicitly documented anywhere but I hope OpenAI has the necessary callbacks to terminate generation when the API stream is killed rather than continuing in the background until another termination condition happens? I suppose one could check this via looking at API usage when a stream is killed early.

link

tuchsen 1128 days ago

Yeah I did a CLI tool for talking to ChatGPT. I'm pretty sure they stop generating when you kill the SSE stream, based on my anecdotal experience of keeping ChatGPT4 costs down by killing it as soon as i get the answer I'm looking for. You're right that it's undocumented behavior though, on a whole the API docs they give you are as thin as the API itself.

link

killthebuddha 1128 days ago

I'm skeptical that the streaming API would really save that much cost. In my experience the vast majority of all tokens used are input tokens rather than completed tokens.

link

boywitharupee 1126 days ago

Any new call to the API is considered fresh. I don't believe your session is saved.

link

newhouseb 1125 days ago

We're talking about the streaming API which streams generated text token by token, not the normal one-shot API. I have no insider knowledge but would agree with your intuition on the normal API.

link

marcotcr 1128 days ago

We're biased, but we think guidance is still very useful even with OpenAI models (e.g. in https://github.com/microsoft/guidance/blob/main/notebooks/ch... we use GPT-4 to do a bunch of stuff). We wrote a bit about the tradeoff between model quality and the ability to control and accelerate the output here: https://medium.com/p/aa0395c31610

link

slundberg 1128 days ago

If you want guidance acceleration speedups (and token healing) then you have to use an open model locally right now, though we are working on setting up a remote server solution as well. I expect APIs will adopt some support for more control over time, but right now commercial endpoints like OpenAI are supported through multiple calls.

We manage the KV-cache in session based way that allows the LLM to just take one forward pass through the whole program (only generating the tokens it needs to)

link

joshka 1128 days ago

Yeah, I'm also curious about a) round trips and b) how much would have to be doubled (is there a new endpoint that keeps the existing context while adding or streams to the api rather than just from it?)

link

rcarmo 1128 days ago

I'm getting valid JSON out of gpt-3.5-turbo without trouble. I supply an example via the assistant context, and tell it to output JSON with specific fields I name.

It does fail roughly 1/10th of the time, but it does work.

link

harshhpareek 1128 days ago

10% failure rate is too damn high for a production use case.

What production use case, you ask? You could do zero-shot entity extraction using ChatGPT if it were more reliable. Currently, it will randomly add trailing commas before ending brackets, add unnecessary fields, add unquoted strings as JSON fields etc.

link

rcarmo 1128 days ago

Which is why this is just an experiment. I’ve gone back to standard translation APIs for everything except the final summarizing (and even them I might go there as well).

link

newhouseb 1128 days ago

I built a similar thing to Grant's work a couple months ago and prototyped what this would look like against OpenAI's APIs [1]. TL;DR is that depending on how confusing your schema is, you might expect up to 5-10x the token usage for a particular prompt but better prompting can definitely reduce this significantly.

[1] https://github.com/newhouseb/clownfish#so-how-do-i-use-this-...

link