|
|
|
|
|
by OskarS
2333 days ago
|
|
I think it's to avoid a branch in `size()`. If you use this trick, you have to check the flag if it's a small in-situ string or a "regular" heap-allocated string, because they they calculate size differently. In the standard library strings, the size is always just stored in the same position in memory, so `size()` has essentially no overhead when inlined, it's just a memory load. Branches on their own aren't particularly expensive (especially not an easily predictable one like this), but they do screw with several compiler optimizations (auto-vectorization, for one). Given that, it seems like a reasonable decision, and the only way you'd know which one is faster is just testing it. I'd personally think it would be interesting to test artificially increasing the sizeof() of strings from 24 bytes (the minimum needed to store a pointer, size and capacity) to something like 64 bytes, just to get longer strings using the SSO. The trade-off in the stack-space of the string seems like it would totally be worth it, and with 56 character long "small strings", a huge number of strings could fit in there (virtually all names, for instance). You probably couldn't do it for std::string (would wreck havoc with ABIs), but my hunch is that performance and memory usage might both benefit from the reduced memory allocations. |
|
Note that the branch can be implemented in a way that compilers will lower into a CMOV, making the code size issue almost moot. I implemented that in fbstring back in the day:
https://github.com/facebook/folly/commit/be4c6d6b3e21914df8a...
This is the entirety of size():
A bit more than a MOV, but not that much.> I'd personally think it would be interesting to test artificially increasing the sizeof() of strings from 24 bytes (the minimum needed to store a pointer, size and capacity) to something like 64 bytes, just to get longer strings using the SSO
And libstdc++ is actually 32 bytes :( It's easy to template on the inline capacity, you can look at llvm::SmallVector or folly::small_vector, but vocabulary types should have the smallest possible footprint. Vast majority of instances are either empty, or very small (think keys in a map).