|
|
|
|
|
by SloopJon
1963 days ago
|
|
He touched on JSON parsers in a previous post about fast_double_parser: "People who write fast parsers tend to roll their own number parsers (e.g., RapidJSON, sajson), and so we did. However, we sacrifice some standard compliance." (The "we" in this context refers to simdjson.) https://lemire.me/blog/2020/03/10/fast-float-parsing-in-prac... He followed up in a comment: "RapidJSON has at least two fast-parsing mode. The fast mode, which I think is what you refer to, is indeed quite fast, but it can be off by one ULP, so it is not standard compliant." The Github README for this new project says, "The fast_float library provides a performance similar to that of the fast_double_parser library." https://github.com/fastfloat/fast_float However, the benchmarks show a significant improvement relative to those in the fast_double_parser README: https://github.com/lemire/fast_double_parser I tried to run the benchmarks, but my CMake is apparently too old, and Homebrew barfed all over the living room rug when I tried to update it. |
|
That highlights the complexity of benchmarking in general and the importance of comparing within the same benchmark. I haven't looked at this in a while but I thought some of the newer JSON parsers were standards compliant (maybe not?).
Anyway, that other blog post answers my question as it looks like the big insight is that you use the fast approach (that everyone uses) when you can, and fall back to slow if you really have to. From that blog link:
"The full idea requires a whole blog post to explain, but the gist of it is that we can attempt to compute the answer, optimistically using a fast algorithm, and fall back on something else (like the standard library) as needed. It turns out that for the kind of numbers we find in JSON documents, we can parse 99% of them using a simple approach. All we have to do is correctly detect the error cases and bail out."
Again, I swear I've seen this in one of the other JSON parsers but maybe I'm misremembering. And again, good for them for breaking it out into a header library for others to use.