| Holy Cow! 2.5GB/s that is amazing. Meanwhile I can barely get Chrome/NodeJS to parse 20MB in less than 100ms :(. How useful (or useless) would Simdjson as a Native Addon to V8 be? I assume transferring the object into JS land would kill all the speed gains? I wrote my own JSON parser just last week, to see if I could improve the NodeJS situation. Discovered some really interesting factoids: (A) JSON parse is CPU-blocking, so if you get a large object, your server cannot handle any other web request until it finishes parsing, this sucks. (B) At first I fixed this by using setImmediate/shim, but discovered to annoying issues: (1) Scheduling too many setImmediates will cause the event loop to block at the "check" cycle, you actually have to load balance across turns in the event loop like so (https://twitter.com/marknadal/status/1242476619752591360) (2) Doing the above will cause your code to be way slow, so a trick instead, is to actually skip setImmediate and invoke your code 3333 (some divider of NodeJS's ~11K stack depth limit) times or for 1ms before doing a real setImmediate. (C) Now that we can parse without blocking, our parser's while loop (https://github.com/amark/gun/blob/master/lib/yson.js) marches X byte increments at a time (I found 32KB to be a sweet spot, not sure why). (D) I'm seeing this pure JS parser be ~2.5X slower than native for big complex JSON objects (20MB). (E) Interestingly enough, I'm seeing 10X~20X faster than native, for parsing JSON records that have large values (ex, embedded image, etc.). (F) Why? This happened when I switched my parser to skip per-byte checks when encountering `"` to next indexOf. So it would seem V8's built in JSON parser is still checking every character for a token which slows it down? (G) I hate switch statements, but woah, I got a minor but noticeable speed boost going from if/else token checks to a switch statement. Happy to answer any other Qs! But compared to OP's 2.5GB/s parsing?! Ha, mine is a joke. |