Hacker News new | ask | show | jobs
by sandkoan 1063 days ago
Problems with OpenAI:

1) You're wasting GPT tokens on outputting JSON instead of meaningful information.

2) GPT functions won't, with absolute, 100% certainty, return JSON in the schema you want. In 1% to 3% of cases it hallucinates fields, etc.

3) This also allows you to output data in arbitrary non-JSON formats.

4) You can't self-host OpenAI functions.

1 comments

Thanks, all good points that would seem to make this library a good fit for certain use-cases.

As with the other poster, I’d be interested to hear a bit more about point 1.

Got it, thanks. Certainly a very interesting and active space. I was playing around with FLARE (https://arxiv.org/abs/2305.06983) for RAG this week, and LMQL (mentioned by another poster) seems to use a similar technique.
In response to your sister comment: the implementation we used was the naive one from LangChain (https://python.langchain.com/docs/modules/chains/additional/...). We've decomposed that to use as a starting point but early results are promising, yes, although it doesn't yet seem to be possible to get the necessary `logprobs` out of the GPT-4 API, so we're stuck with 3.5-turbo atm.
Ahh, I've been meaning to try FLARE—was it a marked improvement over traditional RAG?
Point 1 doesn't feel like a good enough reason. The number of tokens outputted as a JSON is so small if you tell GPT to output it properly.
Costs add up surprisingly quickly. A quote-colon-space-quote combo alone is four tokens wasted. Now scale that up....
Using tiktokenizer, these are only two tokens: quote-colon is token 498, space-quote is token 330 (as per https://tiktokenizer.vercel.app/ ). But I agree to the general argument.

I think what factors in even more when you use the API is that you do not have fine-grained control over the generation process. If you follow the MS guidance approach, you fill in structured text yourself, and then let the model generate only the value parts, e.g. up to the next quote. To do that more or less word by word, you have multiple API calls, and have to be very smart about providing the right stop tokens.