| HN Mirror

// Check the quote if (in[i] == '"') { i++; len = 0; while (true) { // Use simd to copy 32 bytes from input to output chunk = in[i..i+32]; out[len..len+32] = chunk; // Note we already wrote 32 bytes, and NOW check if there was a quote in there if (int quote = chunk.find('"')) { len += quote; break; } len += 32; // No quote, so keep parsing the string i += 32; } }

Yes, this is exactly why simdjson wants padding. It certainly doesn't need the string to be embedded in source or any such nonsense.

I wish there was a standardized attribute that C++ knew about that pretty much just said "hey, we're not right next to some memory-managed disaster, and if you read off this buffer, you promise not to use the results".

It is awful practice to read off the end of a buffer and let those bytes affect your behavior, but it is almost always harmless to read extra bytes (and mask them off or ignore them) unless you're next to a page boundary or in some dangerous region of memory that's mappped to some device.

This attribute would also need to be understood by tools like Valgrind (to the extent that valgrind can/can't track whether you're feeding this nonsense into a computation, which it handles pretty well).