| Sure, to simplify lets assume a 64-bit CPU (this all works for 32-bit but that's less common these days and the actual numbers are different) C++ std::string can contain up to 15 (other popular implementations) or 22 bytes (libc++ from Clang) of inline text, and the data structure itself is either 24 bytes (Clang again) or 32 bytes of storage. Here's Raymond Chen: https://devblogs.microsoft.com/oldnewthing/20240510-00/?p=10... CompactString is 24 bytes of storage with all 24 bytes as potential inline text. When the 24 bytes are valid UTF-8 text, then that's the content of the CompactString e.g. "https://example.org/cool", if they aren't the last byte will be invalid UTF-8, and this signals whether some of the other 23 bytes were inline UTF-8 (and if so how many) or whether they should be interpreted as a pointer, size and capacity. ColdString is a radically different idea, it's 8 bytes of unaligned storage and it's one of three things: 1. 8 bytes of UTF-8 text, as before we can tell by whether it's valid UTF-8 text or 2. 0-7 bytes of UTF-8 text, prefixed by an invalid UTF-8 byte telling us how many of the remaining bytes are text or 3. An encoded pointer to a length-prefixed data structure, signalled by the presence of the UTF-8 continuation marker bits which should never be present in the first byte of a string. I really like ColdString because it's so much in the "use the whole buffalo" spirit of these modern safe yet high performance types. UTF-8 has what are called "overlong prefixes" because it was invented before Unicode decided it would never grow beyond U+10_FFFF and these are often just a useless impediment, but ColdString uses those prefixes. |
How do CompactString/ColdString compare to std::string implementations performance-wise? From the looks of it, they must be somewhat slower than C++ strings