Hacker News new | ask | show | jobs
by rossirpaulo 1071 days ago
This is great! We had a similar thought and couldn't agree more with "LLMs prefer producing something rather than nothing." We have been consistently requesting responses in JSON format, which, despite its numerous advantages, sometimes imposes an obligation for an output even if it shouldn't. This frequently results in hallucinations. Encouraging NULL returns, for example, is a great way to deal with that.
3 comments

I've found that this is best dealt with along two axes with constrained options. i.e., request both a string and a boolean, and if you get boolean false you can simply ignore the string. So when the LLM ignores you and prints a string like "This article does not contain mention of sharks", you can discard that easily.

If you tell it "Return what this says about sharks or nothing if it does not mention them", it will mess up.

Have you tried this sort of prompt?

User text: "Blah blah ... Sharks ... Surfing ..." Instruction: Return an JSON object containing an array of all sentences in the user text which mention sharks directly or by implication. Response: {"list_of_shark_related_sentences": [

Stop token: ']}'

It'll try to complete the JSON response and it'll try to end it by closing the array and object as shown in the stop token. This severely limits rambling, and if it does add a spurious field it'll (usually) still be valid JSON and you can usually just ignore the unwanted field.

wrt OpenAI, text-davinci-003 handles this well, the other models not so much.

Making it rank multiple attributes on a scale of 1-10 also works decent in my experience. Then one can simply k-means cluster (or similar) and evaluate the grouping to see how accurate its estimations are
Yes, agreed. I'm doing this as well. Works excellently for NLP classifier tasks.

Funnily enough, there is a certain propensity for it to output round numbers (50, 100, etc.) so I have to ask it not to do this and provide examples ("like 27, 63, or 4"). Now that I think about it I should probably randomize those.

Interesting, I've just been doing 1-10 (maybe i should include 0) -- Do you get the same result if you floatify the larger integers, e.g. 0.000 - 10.000?
Have you tried using GPT-4s new Function Call feature? The "killer" portion of this is guaranteed JSON based on a schema you pass to the model.
That's a good point! We're actually working on integrating this as well, but in practice, what we've found is that LLM's in general don't like to respond with empty strings for example.

My hypothesis here is that due to RLFH, there's likely some implicit learning that tangentially related content is better than no content.

Given that, you'd likely still get better results with your schema being:

"string | null" so the LLM can output a null instead of "" since there is probably not as much training data that gives "" high log prob values.

But we're looking forward to evaluating the functions call, and seeing what the metrics show!

I integrated the function calling feature into my personal project and wrote a blog post about it here:

https://letscooktime.com/Blog/ai,/machine/learning,/chatgpt,...

Hopefully this saves you some time!

Thanks for the post! Really liked it being short and precise to the point.

Also looking to integrate the new function feature and now already got some learnings out of the post without even starting to code.

Nope, it's not guaranteed. They warn you in the OpenAI docs that it might hallucinate inexistent parameters.
Constrained generation should not require calling supplemental functions. It's as simply as banning or reducing the weight of the naughty tokens. There are several libraries which enable this without function calling (microsoft guidance, jsonformer, lmql)
The output is not 100% guaranteed. Be careful about that and have another layer to check the output.

I had a schema with a string enum property to categorise some inputs. One of the category names was "media/other" or something to that effect. Sometimes the output would stop at just media even though it wasn't a valid option in the schema.

I've run into the same issue, but you can turn it into an advantage if you are careful enough.

Basically, give the LLM a schema that is loose enough for the LLM to expand where it feels expansion is needed. Saying always "return a number" is super limiting if the LLM has figured out you need a range instead. Saying "always populate this field" is silly because sometimes the field doesn't need to be populated.