Hacker News new | ask | show | jobs
by zuck_vs_musk 818 days ago
> strong guarantees on the output (i.e. 100% reliable, without requiring any retries).

Has anyone seen a good JSON library that can handle slightly broken JSON? e.g. trailing commas, unescaped newlines, etc.? I have not found a good one.

4 comments

Entirely broken JSON, no — I would be surprised if one existed. If you want slightly laxer semantics like trailing commas, JSON5 [1] is a pretty good spec and is JSON-compatible. I used to use it for LLMs (while telling them to emit JSON — no need to confuse them by explaining JSON5), in order to handle things like trailing commas, but in my experience LLMs have gotten good enough over the last year I mostly don't even bother anymore.

1: https://json5.org/

You can force LLMs to generate valid json by using a context free grammar FWIW
Please elaborate
Matt Rickard has a good entry level blog post about it, from the angle of regex constraining [0]. Context free grammars follow the same principle, except using a finite state machine to restrict the action space.

[0]: https://matt-rickard.com/rellm

Look at llama.cpp grammars, lmql, or guidance-ai
Should be easy to build on top of a lexer as a pre-parsing pass.
Dirtyjson