Hacker News new | ask | show | jobs
by jhokanson 1960 days ago
I'm curious as to what the biggest win in terms of speed was here (in terms of an approach, good lookup tables?). Also I'm curious how this compares to the many (?) JSON parsers that have rolled their own number parser because everyone knows the standard library is so slow ... (just more accurate?, faster?). Regardless, kudos to the authors on their work!
1 comments

He touched on JSON parsers in a previous post about fast_double_parser: "People who write fast parsers tend to roll their own number parsers (e.g., RapidJSON, sajson), and so we did. However, we sacrifice some standard compliance." (The "we" in this context refers to simdjson.)

https://lemire.me/blog/2020/03/10/fast-float-parsing-in-prac...

He followed up in a comment: "RapidJSON has at least two fast-parsing mode. The fast mode, which I think is what you refer to, is indeed quite fast, but it can be off by one ULP, so it is not standard compliant."

The Github README for this new project says, "The fast_float library provides a performance similar to that of the fast_double_parser library."

https://github.com/fastfloat/fast_float

However, the benchmarks show a significant improvement relative to those in the fast_double_parser README:

https://github.com/lemire/fast_double_parser

I tried to run the benchmarks, but my CMake is apparently too old, and Homebrew barfed all over the living room rug when I tried to update it.

Wow, those are big performance differences (660 MB/s for fast-double vs 1042 MB/s for the 'newer' fast-float), although most of the numbers (for the different libraries being tested) are all over the place, and even 'strtod' more than doubled in speed between the two tests (70 MB/s fast-double vs 190 fast-float MB/s). It wouldn't surprise me if those two code bases are essentially the same.

That highlights the complexity of benchmarking in general and the importance of comparing within the same benchmark. I haven't looked at this in a while but I thought some of the newer JSON parsers were standards compliant (maybe not?).

Anyway, that other blog post answers my question as it looks like the big insight is that you use the fast approach (that everyone uses) when you can, and fall back to slow if you really have to. From that blog link:

"The full idea requires a whole blog post to explain, but the gist of it is that we can attempt to compute the answer, optimistically using a fast algorithm, and fall back on something else (like the standard library) as needed. It turns out that for the kind of numbers we find in JSON documents, we can parse 99% of them using a simple approach. All we have to do is correctly detect the error cases and bail out."

Again, I swear I've seen this in one of the other JSON parsers but maybe I'm misremembering. And again, good for them for breaking it out into a header library for others to use.