|
|
|
|
|
by jkeiser
2267 days ago
|
|
Worth pointing out: I thought it was just the SIMD that made it fast when I first got involved. It turns out that while it helps, it's just a tool that helps to achieve the real gain: eliminating branches (if statements) that make the processor stumble and fall in its attempts to sprint ahead (speculative execution). simdjson's first stage pushes this capability really far towards its limit, achieving 4 instructions per cycle by not branching. And yes, 1 cycle is the smallest amount of time a single instruction can take. Turns out a single thread is running multiple instructions in parallel at any given time, as long as you don't trip it up! Parsing is notoriously serial and branchy, which is what makes simdjson so out of the ordinary. It's using a sort of "microparallel algorithm," running a small parallel parse on a SIMD-sized chunk of JSON (16-32 bytes depending on architecture), and then moving to the next. And yeah, you have to go back over a decade to find CPUs that don't have SIMD. simdjson runs on those too, just obviously doesn't use SIMD instructions :) |
|
An interesting point with the design of simdjson loses its branchlessness in "stage 2". I originally had a bunch of very elaborate plans to try to stay branchless far further into the parsing process. It proved just too hard to make it work. There were some promising things that ultra-modern Intel chips - meaning Icelake - and future iterations of ARM (SVE/SVE2) - are adding to their SIMD abilities, so it might be worth revisiting this in a few years (there aren't too many Icelake boxes out there and SVE barely exists).