Hacker News new | ask | show | jobs
by whoopdedo 767 days ago
GCC putting a pointer at the top of the structure seems reminiscent of the way Pascal stored strings. A PString is the address of a character buffer like C, but the length of the string is stored at a negative offset. I may be remembering wrong but I think there was an older C++ STL that also used negative offsets.

As much as these snippets make clang look heavier, I wonder what it compiles to in practice when the compiler can make better inferences. If you can prove the state of the `is_small` bit those branches disappear. Even at runtime, which implementation is more performant? Real-world profiling may favor clang with branch prediction and speculative processing. Then again, speculation has become a dirty word lately.[1]

[1] Get it? "Dirty" because of the cache. I'm sorry, that pun was entirely unintentional.

2 comments

> A PString is the address of a character buffer like C, but the length of the string is stored at a negative offset. I may be remembering wrong but I think there was an older C++ STL that also used negative offsets.

In the Microsoft world, BSTR from COM/OLE does this. Though I think the length prefix is 4 bytes, and the payload is 16 bit wchars.

An interesting difference between BSTR and Pascal strings is that BSTR strings, in addition to the length prefix, are NUL terminated (for compatibility with C string APIs). And since Pascal strings track their length in the prefix, they can support strings with embedded NUL bytes.
Agree, re: clang. In dominant 64-bit platforms it’s both smaller and eliminates more allocations.

When placing a bet on real workload performance, I’d take those attributes every day of the week and twice on Sundays.

> it’s both smaller

not sure about "smaller" - cacheline is 32bytes or larger on all modern 64bit cpus