|
|
|
|
|
by anonymoushn
735 days ago
|
|
> - Parsing an integer in JSON requires multiplication by 10 and addition / subtraction for each digit. Meanwhile, in protobuf, fixed64 and related types are either in host order or are just a ntohl away; int64 and other varint types only require bit twiddling (masks, shifts, etc). Do you think it's easier to parse "4294967296" from a string, or 5 bytes along the lines of {0x88, 0x80, 0x80, 0x80, 0x00}? For this one, actually I think the varint may be harder because you have to parse it before you know which byte the next value starts on, but recently there has been some progress in the area of fast varint parsers. For parsing decimal numbers, a good start is here http://0x80.pl/articles/simd-parsing-int-sequences.html. Some users at https://highload.fun/tasks/1/leaderboard are calculating the sum of parsed base-10 integers at about the speed of reading the string from RAM, but this task is subtly easier than parsing each number individually, which may only be doable at half or a quarter of the speed of reading the string from RAM, and then you'd have to pay a bit more to also write out the parsed values to another buffer. |
|
> While conversion from a string into an integer value is feasible with SIMD instructions, this application is unpractical. For typical cases, when a single value is parsed, scalar procedures — like the standard atoi or strtol — are faster than any fancy SSE code.
> However, SIMD procedures can be really fast and convert in parallel several numbers. There is only one "but": the input data has to be regular and valid, i.e. the input string must contain only ASCII digits.
There definitely are some benefits and speedups available with SIMD, but that intro doesn't inspire a whole lot of confidence in its relevance to JSON parsing, where the only case where you might have this regularity is if you definitely have an array of integers. (JSON strings are not restricted to ASCII, as they can and do include Unicode.)