| HN Mirror

Y	Hacker News new \| ask \| show \| jobs

by crazyedgar 1139 days ago

I like the idea, but I think a library that focuses on producing requests and parsing responses according to schema is better. Sending requests to the server is orthogonal to the purpose.

What we've found useful in practice in dealing with similar problems:

- Use json5 instead of json when parsing. It allows trailing commas.

- Don't let it respond in true/false. Instead, ask it for a short sentence explaining whether it is true or false. Afterwards, use a small embedding model such as sbert to extract true/false from the sentence. We've found that GPT is able to reason better in this case, and it is much more robust.

- For numerical scores, do a similar thing by asking GPT for a description, then with the small embedding model write a few examples matching your score scale, and for each response use the score of the best matched example. If you let GPT give you scores directly without explanation, 20% of the time it will give you nonsense.

7 comments

IanCal 1139 days ago

> Don't let it respond in true/false. Instead, ask it for a short sentence explaining whether it is true or false. Afterwards, use a small embedding model such as sbert to extract true/false from the sentence. We've found that GPT is able to reason better in this case, and it is much more robust.

Have you tried just getting it to do both? It reasons far better given some space to think, so I often have it explain things first then give the answer. You're effectively then using gpt for the extraction too.

This hugely improved the class hierarchies it was creating for me, significantly improving the reuse of classes and using better classes for fields too.

link

RobertDeNiro 1139 days ago

This seems like a better approach. Introducing another unrelated model seems like it would just add an extra point of failure to watch out for.

link

IanCal 1139 days ago

There's a benefit in having a model that can output only true/false if that's all that's acceptable, but if I was doing this myself I'd want to see how far I could get with just one model (and then the simple dev approach of running it again if it fails to produce a valid answer, or feeding it back with the error message). If it works 99% of the time you can get away with rerunning pretty cheaply.

link

icyfox 1139 days ago

Thanks for the thoughts! I've deployed a few meta models that act like you're describing for second stage predictions, but for fuzzy task definitions have actually seen similar luck with having GPT explicitly explain its rational and then force it to choose a true/false rating. My payloads often end up looking like:

  class Payload:
    reasoning: str = Field(description="Why this value might be true or false),
    answer: bool

Since it's autoregressive I imagine the schema helps to define the universe of what it's supposed to do, then the decoder attention when it's filling the `answer` can look back on the reasoning and weigh the sentiment internally. I imagine the accuracy specifics depend a lot on the end deployment here.

link

ZephyrBlu 1139 days ago

Didn't know about json5, so I had to deal with trailing commas in another way. I found that providing an example of an array without trailing commas was enough for GPT to pick up on it.

The tips on booleans and numerics are interesting! Will keep them in mind if I ever need to do that. I've definitely experienced a few quirks like that (E.g. ChatGPT 'helpfully' responding with "Here's your JSON" instead of just giving me JSON).

link

generalizations 1139 days ago

I’ve also found good results by asking for it to give the answer first, then to explain its answer. Best of both worlds, since I can just ignore everything following and it still seems to do the internal preparatory ‘thinking’.

link

iamflimflam1 1139 days ago

There’s some really good info along thff we same lines in this course https://learn.deeplearning.ai/chatgpt-prompt-eng

link

zh217 1139 days ago

Another alternative JSON parser is the YAML parser. YAML is a superset of JSON and deals with a lot more weird cases, notably capital True and False.

link

syntaxing 1139 days ago

Is it possible to give an example what a small embedded model would look like? Curious how to make something like this!

link

crazyedgar 1139 days ago

We just use https://www.sbert.net/. Compare the embedding of the answer with the embeddings of YES versus NO.

link