| I like the idea, but I think a library that focuses on producing requests and parsing responses according to schema is better. Sending requests to the server is orthogonal to the purpose. What we've found useful in practice in dealing with similar problems: - Use json5 instead of json when parsing. It allows trailing commas. - Don't let it respond in true/false. Instead, ask it for a short sentence explaining whether it is true or false. Afterwards, use a small embedding model such as sbert to extract true/false from the sentence. We've found that GPT is able to reason better in this case, and it is much more robust. - For numerical scores, do a similar thing by asking GPT for a description, then with the small embedding model write a few examples matching your score scale, and for each response use the score of the best matched example. If you let GPT give you scores directly without explanation, 20% of the time it will give you nonsense. |
Have you tried just getting it to do both? It reasons far better given some space to think, so I often have it explain things first then give the answer. You're effectively then using gpt for the extraction too.
This hugely improved the class hierarchies it was creating for me, significantly improving the reuse of classes and using better classes for fields too.