Hacker News new | ask | show | jobs
by janmo 958 days ago
I wrote a small JSON parser in C myself which I called jsoncut. It just cuts out a certain part of a json file. I deal with large JSON files, but want only to extract and parse certain parts of it. All libraries I tried parse everything, use a lot of RAM and are slow.

Link here, if interested to have a look: https://github.com/rgex/jsoncut

1 comments

The words you’re looking for are SAX-like JSON parser or streaming json parser. I don’t know if there’s any command line tools like the one you wrote that use it though to provide a jq-like interface.
I tried JQ and other command line tools, all were extremely slow and seemed to always parse the entire file.

My parser just reads the file byte by byte until it finds the target, then outputs the content. When that's done it stops reading the file, meaning that it can be extremely fast when the targeted information is at the beginning of the JSON file.

You're still describing a SAX parser (i.e. streaming). jq doesn't use a SAX parser because it's a multi-pass document editor at its core, hence why I said "jq-like" in terms of supporting a similar syntax for single-pass queries. If you used RapidJSON's SAX parser in the body of your custom code (returning false once you found what you're looking for), I'm pretty sure it would significantly outperform your custom hand-rolled code. Of course, your custom code is very small with no external dependencies and presumably fast enough, so tradeoffs.