Hacker News new | ask | show | jobs
by caesil 1071 days ago
I've found that this is best dealt with along two axes with constrained options. i.e., request both a string and a boolean, and if you get boolean false you can simply ignore the string. So when the LLM ignores you and prints a string like "This article does not contain mention of sharks", you can discard that easily.

If you tell it "Return what this says about sharks or nothing if it does not mention them", it will mess up.

2 comments

Have you tried this sort of prompt?

User text: "Blah blah ... Sharks ... Surfing ..." Instruction: Return an JSON object containing an array of all sentences in the user text which mention sharks directly or by implication. Response: {"list_of_shark_related_sentences": [

Stop token: ']}'

It'll try to complete the JSON response and it'll try to end it by closing the array and object as shown in the stop token. This severely limits rambling, and if it does add a spurious field it'll (usually) still be valid JSON and you can usually just ignore the unwanted field.

wrt OpenAI, text-davinci-003 handles this well, the other models not so much.

Making it rank multiple attributes on a scale of 1-10 also works decent in my experience. Then one can simply k-means cluster (or similar) and evaluate the grouping to see how accurate its estimations are
Yes, agreed. I'm doing this as well. Works excellently for NLP classifier tasks.

Funnily enough, there is a certain propensity for it to output round numbers (50, 100, etc.) so I have to ask it not to do this and provide examples ("like 27, 63, or 4"). Now that I think about it I should probably randomize those.

Interesting, I've just been doing 1-10 (maybe i should include 0) -- Do you get the same result if you floatify the larger integers, e.g. 0.000 - 10.000?