Hacker News new | ask | show | jobs
by beached_whale 960 days ago
The large documents are often fixed by using mmap/virtualalloc of the file, but Boost.JSON has a streaming mode and is reasonably fast and the license is good for pulling into anything. It's not the fastest, but faster than rapid with the interface of nlohmann JSON. For most tasks, it does seem that most of hte libraries taking a JSON document approach are wasting a lot of time/memory to get to the point that we want normal data structures, not a JSON document tree. If we pull that out and parse straight to the data structures there is a lot of win in performance and memory with less/no code, just mappings. That's how I approached it at least.
1 comments

> that most of hte libraries taking a JSON document approach are wasting a lot of time/memory

I agree. That's the same situation as with XML/HTML. In many cases you don't really need to build a DOM or JSOM in memory. If your task is about deserializing some native structures.

This XML scanner of mine does not allocate any memory at all while parsing HTML/XML: https://www.codeproject.com/Articles/14076/Fast-and-Compact-...

It is even simpler than SAX parser.

For the interesting JSON of a significant size, an interator/range interface that parses to concrete types works really well. Usually they are large arrays or JSONL like things