Hacker News new | ask | show | jobs
by xakahnx 2276 days ago
I like string processing libraries that implement a multi-document feature like the one mentioned here. There's always some efficiency to be gained- maybe the public API has a lot of branching or initialization, maybe it acquires a lock, etc. Batching will amortize that cost, or open up new opportunities for SIMD processing. Letting the user reduce overhead through batching isn't something I see supported in other libraries, even ones that advertise high performance.
1 comments

ndjson isn't really batching though, it's a derivative of JSON (or a superset?) and there are a few variations of it[1][2][3].

It's also supported by quite a few libraries[4] and tools[5] too -- plus many others that don't document it as being ndjson/jsonl (such as AWS CloudTrail logs). I've got support for it in the non-POSIX UNIX/Linux shell I'd written too[6]

The issue is really more that, like with comments, JSON doesn't support streaming nor even multiple documents (something formats like YAML already have built into the spec) so it becomes something you need to advertise if you're writing a parser simply because it's not part of the standard JSON specification.

[1] https://en.wikipedia.org/wiki/JSON_streaming#Line-delimited_...

[2] http://ndjson.org/

[3] http://jsonlines.org/

[4] http://ndjson.org/libraries.html

[5] http://jsonlines.org/on_the_web/

[6] https://murex.rocks/docs/types/jsonl.html

Good point. I was picturing some gather/scatter over strings which are not in adjacent memory (maybe a generous interpretation for my use-case). Concatenating small strings into ndjson may still come out ahead performance-wise.