| HN Mirror

Y	Hacker News new \| ask \| show \| jobs


	by rossirpaulo 1071 days ago
	This is great! We had a similar thought and couldn't agree more with "LLMs prefer producing something rather than nothing." We have been consistently requesting responses in JSON format, which, despite its numerous advantages, sometimes imposes an obligation for an output even if it shouldn't. This frequently results in hallucinations. Encouraging NULL returns, for example, is a great way to deal with that.

3 comments

caesil 1071 days ago

I've found that this is best dealt with along two axes with constrained options. i.e., request both a string and a boolean, and if you get boolean false you can simply ignore the string. So when the LLM ignores you and prints a string like "This article does not contain mention of sharks", you can discard that easily.

If you tell it "Return what this says about sharks or nothing if it does not mention them", it will mess up.

link

LawTalkingGuy 1071 days ago

Have you tried this sort of prompt?

User text: "Blah blah ... Sharks ... Surfing ..." Instruction: Return an JSON object containing an array of all sentences in the user text which mention sharks directly or by implication. Response: {"list_of_shark_related_sentences": [

Stop token: ']}'

It'll try to complete the JSON response and it'll try to end it by closing the array and object as shown in the stop token. This severely limits rambling, and if it does add a spurious field it'll (usually) still be valid JSON and you can usually just ignore the unwanted field.

wrt OpenAI, text-davinci-003 handles this well, the other models not so much.

link

dontupvoteme 1071 days ago

Making it rank multiple attributes on a scale of 1-10 also works decent in my experience. Then one can simply k-means cluster (or similar) and evaluate the grouping to see how accurate its estimations are

link

caesil 1071 days ago

Yes, agreed. I'm doing this as well. Works excellently for NLP classifier tasks.

Funnily enough, there is a certain propensity for it to output round numbers (50, 100, etc.) so I have to ask it not to do this and provide examples ("like 27, 63, or 4"). Now that I think about it I should probably randomize those.

link

dontupvoteme 1070 days ago

Interesting, I've just been doing 1-10 (maybe i should include 0) -- Do you get the same result if you floatify the larger integers, e.g. 0.000 - 10.000?

link

galleywest200 1071 days ago

Have you tried using GPT-4s new Function Call feature? The "killer" portion of this is guaranteed JSON based on a schema you pass to the model.

link

hellovai 1071 days ago

That's a good point! We're actually working on integrating this as well, but in practice, what we've found is that LLM's in general don't like to respond with empty strings for example.

My hypothesis here is that due to RLFH, there's likely some implicit learning that tangentially related content is better than no content.

Given that, you'd likely still get better results with your schema being:

"string | null" so the LLM can output a null instead of "" since there is probably not as much training data that gives "" high log prob values.

But we're looking forward to evaluating the functions call, and seeing what the metrics show!

link

guhidalg 1071 days ago

I integrated the function calling feature into my personal project and wrote a blog post about it here:

https://letscooktime.com/Blog/ai,/machine/learning,/chatgpt,...

Hopefully this saves you some time!

link

CallMeMarc 1071 days ago

Thanks for the post! Really liked it being short and precise to the point.

Also looking to integrate the new function feature and now already got some learnings out of the post without even starting to code.

link

rolisz 1071 days ago

Nope, it's not guaranteed. They warn you in the OpenAI docs that it might hallucinate inexistent parameters.

link

Der_Einzige 1071 days ago

Constrained generation should not require calling supplemental functions. It's as simply as banning or reducing the weight of the naughty tokens. There are several libraries which enable this without function calling (microsoft guidance, jsonformer, lmql)

link

msp26 1071 days ago

The output is not 100% guaranteed. Be careful about that and have another layer to check the output.

I had a schema with a string enum property to categorise some inputs. One of the category names was "media/other" or something to that effect. Sometimes the output would stop at just media even though it wasn't a valid option in the schema.

link

com2kid 1071 days ago

I've run into the same issue, but you can turn it into an advantage if you are careful enough.

Basically, give the LLM a schema that is loose enough for the LLM to expand where it feels expansion is needed. Saying always "return a number" is super limiting if the LLM has figured out you need a range instead. Saying "always populate this field" is silly because sometimes the field doesn't need to be populated.

link