Hacker News new | ask | show | jobs
by arun-mani-j 960 days ago
I remember reading a SO question which asks for a C library to parse JSON. A comment was like - C developers won't use a library for JSON, they will write one themselves.

I don't know how "true" that comment is but I thought I should try to write a parser myself to get a feel :D

So I wrote one, in Python - https://arunmani.in/articles/silly-json-parser/

It was a delightful experience though, writing and testing to break your own code with different variety of inputs. :)

4 comments

> I remember reading a SO question which asks for a C library to parse JSON. A comment was like - C developers won't use a library for JSON, they will write one themselves.

> I don't know how "true" that comment is

Either way it's a good way to get a pair of quadratic loops in your program: https://nee.lv/2021/02/28/How-I-cut-GTA-Online-loading-times...

I wrote a small JSON parser in C myself which I called jsoncut. It just cuts out a certain part of a json file. I deal with large JSON files, but want only to extract and parse certain parts of it. All libraries I tried parse everything, use a lot of RAM and are slow.

Link here, if interested to have a look: https://github.com/rgex/jsoncut

The words you’re looking for are SAX-like JSON parser or streaming json parser. I don’t know if there’s any command line tools like the one you wrote that use it though to provide a jq-like interface.
I tried JQ and other command line tools, all were extremely slow and seemed to always parse the entire file.

My parser just reads the file byte by byte until it finds the target, then outputs the content. When that's done it stops reading the file, meaning that it can be extremely fast when the targeted information is at the beginning of the JSON file.

You're still describing a SAX parser (i.e. streaming). jq doesn't use a SAX parser because it's a multi-pass document editor at its core, hence why I said "jq-like" in terms of supporting a similar syntax for single-pass queries. If you used RapidJSON's SAX parser in the body of your custom code (returning false once you found what you're looking for), I'm pretty sure it would significantly outperform your custom hand-rolled code. Of course, your custom code is very small with no external dependencies and presumably fast enough, so tradeoffs.
I guess there are only so many ways to write a JSON parser b cause one I wrote on a train in Python looks very similar!

I thought it would be nice and simple but it really was still simpler than I expected. It's a fantastic spec if you need to throw one together yourself, without massive performance considerations.

Good for you but what does this have to do with the article?