|
|
|
|
|
by glangdale
2267 days ago
|
|
This is a good explanation of why it's fast. An interesting point with the design of simdjson loses its branchlessness in "stage 2". I originally had a bunch of very elaborate plans to try to stay branchless far further into the parsing process. It proved just too hard to make it work. There were some promising things that ultra-modern Intel chips - meaning Icelake - and future iterations of ARM (SVE/SVE2) - are adding to their SIMD abilities, so it might be worth revisiting this in a few years (there aren't too many Icelake boxes out there and SVE barely exists). |
|
Making it so you can handle all the brackets at once, all the strings at once, all the numbers at once, would make a big difference, and we're thinking about that. Another thing that could help is making the if statement more predictable using type information from the user. get<int>() could mean "I expect this next thing to be an integer, so parse it that way and just yell if it's not, please."
It's difficult. But it's why I'm still so fascinated! Solving JSON thoroughly and completely will give us a lot of information on how to quickly parse XML, YAML, and other file formats.
We've clearly been collectively doing parsing wrong (including me) if there's this much of a gap. It's exciting to see something innovative and new in this domain and even being able to contribute to it :) @lemire deserves a ton of credit for making an actual project out of his work and promoting it; I likely wouldn't have heard of it otherwise.