Hacker News new | ask | show | jobs
by dmazzoni 766 days ago
This is so interesting!

I wonder if anyone has ever tried an implementation that prioritizes even more minimal memory usage for small strings?

Of the three implementations discussed, the smallest (on a 64-bit system) is 24 bytes.

How about 8 bytes: the union of a single pointer and a char[8]?

For strings of 6 or fewer characters, it uses the first 6 bytes to store the string, one for the null terminator, and the last byte to store the length, but with a trick (see below).

For strings of 7 or more characters, it uses all 8 to store the address of a larger block of memory storing the capacity, size, and string bytes.

The trick is to use the last byte as a way to know whether the bytes are encoding a short string or a pointer. Since pointers will never be odd, we just store an odd number in that last byte. For example, you could store (size << 1) | 0x1

So if the last bit is 1, it's a short string. The size is bytes[7] >> 1, and the data() pointer is just equal to the string itself.

If the last bit is 0, treat the whole data as a pointer to a structure that encodes the capacity and size and then string bytes as usual.

1 comments

Having to do a load to get the metadata is probably why. Say you have a big array of strings and you care about the compactness of that array for cache reasons, so making the string structs small matters. You probably want to do some operation on all of them, e.g. accumulate the length of all of them so you can determine how much space is required to write them out. Now this either requires a byte scan for the null to determine length, or it requires a load from the heap (that’s probably cold) for the metadata, and a second load from the heap (that is also probably cold).

Generally when you do memory compactification of non-serialized structures it’s for cache related performance improvements, so unless the common case is overwhelmingly small strings there’s probably not a benefit. It’s worth testing to see where this is an improvement, but my guess is that it’s in extremely limited regimes, but maybe some bit manipulations instead of traditional byte scan could yield improvements, likewise it may be worth going to larger SSO sizes for SIMD and a larger scope of SSO cases.

If you’re interested in data structure optimizations like this there’s a cppcon talk about Facebook’s string implementation that has some clever tricks.