Hacker News new | ask | show | jobs
by crazyedgar 1139 days ago
I like the idea, but I think a library that focuses on producing requests and parsing responses according to schema is better. Sending requests to the server is orthogonal to the purpose.

What we've found useful in practice in dealing with similar problems:

- Use json5 instead of json when parsing. It allows trailing commas.

- Don't let it respond in true/false. Instead, ask it for a short sentence explaining whether it is true or false. Afterwards, use a small embedding model such as sbert to extract true/false from the sentence. We've found that GPT is able to reason better in this case, and it is much more robust.

- For numerical scores, do a similar thing by asking GPT for a description, then with the small embedding model write a few examples matching your score scale, and for each response use the score of the best matched example. If you let GPT give you scores directly without explanation, 20% of the time it will give you nonsense.

7 comments

> Don't let it respond in true/false. Instead, ask it for a short sentence explaining whether it is true or false. Afterwards, use a small embedding model such as sbert to extract true/false from the sentence. We've found that GPT is able to reason better in this case, and it is much more robust.

Have you tried just getting it to do both? It reasons far better given some space to think, so I often have it explain things first then give the answer. You're effectively then using gpt for the extraction too.

This hugely improved the class hierarchies it was creating for me, significantly improving the reuse of classes and using better classes for fields too.

This seems like a better approach. Introducing another unrelated model seems like it would just add an extra point of failure to watch out for.
There's a benefit in having a model that can output only true/false if that's all that's acceptable, but if I was doing this myself I'd want to see how far I could get with just one model (and then the simple dev approach of running it again if it fails to produce a valid answer, or feeding it back with the error message). If it works 99% of the time you can get away with rerunning pretty cheaply.
Thanks for the thoughts! I've deployed a few meta models that act like you're describing for second stage predictions, but for fuzzy task definitions have actually seen similar luck with having GPT explicitly explain its rational and then force it to choose a true/false rating. My payloads often end up looking like:

  class Payload:
    reasoning: str = Field(description="Why this value might be true or false),
    answer: bool
Since it's autoregressive I imagine the schema helps to define the universe of what it's supposed to do, then the decoder attention when it's filling the `answer` can look back on the reasoning and weigh the sentiment internally. I imagine the accuracy specifics depend a lot on the end deployment here.
Didn't know about json5, so I had to deal with trailing commas in another way. I found that providing an example of an array without trailing commas was enough for GPT to pick up on it.

The tips on booleans and numerics are interesting! Will keep them in mind if I ever need to do that. I've definitely experienced a few quirks like that (E.g. ChatGPT 'helpfully' responding with "Here's your JSON" instead of just giving me JSON).

I’ve also found good results by asking for it to give the answer first, then to explain its answer. Best of both worlds, since I can just ignore everything following and it still seems to do the internal preparatory ‘thinking’.
There’s some really good info along thff we same lines in this course https://learn.deeplearning.ai/chatgpt-prompt-eng
Another alternative JSON parser is the YAML parser. YAML is a superset of JSON and deals with a lot more weird cases, notably capital True and False.
Is it possible to give an example what a small embedded model would look like? Curious how to make something like this!
We just use https://www.sbert.net/. Compare the embedding of the answer with the embeddings of YES versus NO.