Hacker News new | ask | show | jobs
by jamifsud 1087 days ago
Anyone know of any good “tolerant” JSON parsers? I’d love to be able to stream a JSON response down to the client and have it be able to parse the JSON as it goes and handle the formatting errors that we sometimes see.
3 comments

There's no bulletproof solution to this. JSON5 (https://www.npmjs.com/package/json5) gets you slightly more leniency, as does plugging the currently streamed content into another smaller LLM. I also wrote a deterministic parser more tailored towards these partially-complete LM outputs. Not perfect certainly but handles the 99% of cases well: https://github.com/piercefreeman/gpt-json. In particular the "streaming" functionality here might be of interest to you.
This looks really cool, thanks for open-sourcing this. I’ve been similarly parsing and validating output from OpenAI’s new functions using a schema defined on a custom Pydantic class, but I can see that your code has a lot of niceties coming from proper battle-testing, including elegant error handling, transformations and good docs.

I’d like to incorporate this in a production workflow for generating schema-compliant test data for use in few-shot promoting - would you mind saying a few words about your medium term plans for the library? The LangChain API is changing all the time at the moment so we’re trying to figure out where it’s safe to stand. No expectations, of course, just curious.

Sure - I'm using it in a few different internal tools and know others are using it in production. The API should be relatively stable at this point since I intentionally kept the scope pretty limited. The main changes over time will be improved robustness and error correction as issues report different JSON schema breaks that we can fix automatically. Let me know if you see more cases that can be addressed here, would love to collaborate on it.
Thanks! Absolutely, will do. I’ll have a play with it today and reach out with a PR any time it makes sense to do so.

I noticed some occasional funkiness from GPT-4 around sending back properly formatted dates yesterday but haven’t yet dug into it properly. Might be a good candidate for a transformation.

> I’d love to be able to stream a JSON response down to the client and have it be able to parse the JSON as it goes

why though?

In a non chat setting where the LLM is performing some reasoning or data extraction it allows you to get JSON directly from the model and stream it to the UI (updating the associated UI fields as new keys come in) while caching the response server side in the exact same JSON format. It’s really simplified our stream + cache setup!
My JsonReader in libgdx and JsonBeans can parse a more relaxed version of JSON. It uses Ragel.