Hacker News new | ask | show | jobs
by jsmith45 1465 days ago
The COW and short string optimizations are not mutually exclusive. If we assume short string optimization is implemented both before and after, then we are back to comparing the atomic increment to allocation. And different allocation approaches can make the cost of heap allocation differ quite substantially. I'd fully expect that some allocation approaches are cheaper than the cache line invalidation from atomic increment, but some others that tend involve a lot of pointer chasing can be rather costly.

Certainly plenty of widely copied strings are short strings, so a COW implementation that lacks the short-string optimization could very easily be a bad bottleneck for multi-core compute.

1 comments

You have accurately described the GNU CoW string :-)

My impression through the fog of history is that what happened was a really clever GNU person with little foresight and no access to an SMP system implemented std::string with CoW. Its performance in practice was so poor that the standard committee intentionally changed the standard to make it an illegal implementation, thereby eradicating the GNU CoW string. There was no higher principled logic.

Yet more recent benchmarks show that there are pretty important use cases where CoW string can be faster:

https://blogs.msmvps.com/gdicanio/2016/07/09/is-copy-on-writ...

https://oribenshir.github.io/afternoon_rusting/blog/copy-on-...

Also, the point of that was to improve multithreading of string: I think this very idea is problematic. I've written at this point hundreds of thousands of line of C++, and the number of times where strings are really, by design, supposed to be shared across threads is honestly counted on the fingers of one hand, just like e.g. justification for using Arc over Rc in rust. 99% of string handling is done as some GUI work on the main thread or as part of some task processing done in some network thread, which stays in that thread.

Clearly there's a frontier where the cost situation begins to favor the CoW approach, and I think authors should consciously choose whether they want a CoW string or not based on their use-case, but that goes against the idea of std::string as a jack-of-all-trades. Personally I don't really like std::string as a concept. It overlaps with too many other concepts. It is just vector<char> or std::unique_ptr<char> with SSO? The latter is nice in cases where you want std::string to adopt or release existing memory. Or do you want something like absl::Cord, which is like the old GNU CoW string but with even more stunts under the hood?