Hacker News new | ask | show | jobs
by jchw 963 days ago
Reasons differ. C++ is a really hard place to be. It's gotten better, but if you can't tolerate exceptions, need code that is as-obviously-memory-safe-as-possible, can parse incrementally (think SAX style), off-the-shelf options like jsoncpp may not fit the bill.

Handling large documents is indeed another big one. It sort-of fits in the same category as being able to parse incrementally. That said, Go has a JSON scanner you can sort of use for incremental parsing, but in practice I've found it to be a lot slower, so for large documents it's a problem.

I've done a couple in hobby projects too. One time I did a partial one in Win32-style C89 because I wanted one that didn't depend on libc.

2 comments

The large documents are often fixed by using mmap/virtualalloc of the file, but Boost.JSON has a streaming mode and is reasonably fast and the license is good for pulling into anything. It's not the fastest, but faster than rapid with the interface of nlohmann JSON. For most tasks, it does seem that most of hte libraries taking a JSON document approach are wasting a lot of time/memory to get to the point that we want normal data structures, not a JSON document tree. If we pull that out and parse straight to the data structures there is a lot of win in performance and memory with less/no code, just mappings. That's how I approached it at least.
> that most of hte libraries taking a JSON document approach are wasting a lot of time/memory

I agree. That's the same situation as with XML/HTML. In many cases you don't really need to build a DOM or JSOM in memory. If your task is about deserializing some native structures.

This XML scanner of mine does not allocate any memory at all while parsing HTML/XML: https://www.codeproject.com/Articles/14076/Fast-and-Compact-...

It is even simpler than SAX parser.

For the interesting JSON of a significant size, an interator/range interface that parses to concrete types works really well. Usually they are large arrays or JSONL like things
If you have files that are large enough that json is a problem, why use json in the first place? Why not use a binary format that will be more compact and easier to memory map?
Chances are they can’t control that; they’re perhaps provided by a vendor.