Hacker News new | ask | show | jobs
by shin_lao 4290 days ago
Yeah my example for copy elision sucked, but that doesn't mean it cannot play in favor when you work by value.

Sharing is only multithreading unfriendly if there's modification happening, modification of textual (i.e. Unicode) data is bad practice and hard to get right

Read-only access to data indeed scales "infinitely" on modern architectures.

No, a string_view points into memory that already exists,

Yes. Right. How do you store that? You need at least one pointer and and an int or two pointers. That 16 bytes. Memcpy for a couple of bytes is very quick when it's stack to stack thanks to page locality.

Also, if you are using pointers you will have aliasing issues which will have an impact on performance. If you work by values you allow the compiler to optimize things better.

For small strings string view are just dumb and "most of the time" strings are very small.

To give a better example of why working a string view is both a bad idea and dangerous, it's as if you said "I don't want to copy this vector, therefore I will work on iterators". That's obviously a bad idea.

1 comments

> Yeah my example for copy elision sucked, but that doesn't mean it cannot play in favor when you work by value.

Not just sucked; it was entirely wrong. Copy elision is not related to std::string vs. string_view. Even with copy elision turned up to 11, returning a std::string will be more expensive than a string_view.

> How do you store that? You need at least one pointer and and an int or two pointers. That 16 bytes. Memcpy for a couple of bytes is very quick when it's stack to stack thanks to page locality.

I was very careful to cover exactly this in my comment.

Computing the memcpy is strictly more work than creating a string_view, since you need the information that is stored in a string view (i.e. pointer and length) to call memcpy.

Furthermore, the 'stack string' is actually stored contained inside a std::string value, which is larger than 16 bytes. There is no way that returning a string_view causes higher memory use at the call site than returning a std::string. (If you're complaining that it forces old strings to be kept around, well, you can always copy a string_view to a new std::string if you need to, i.e. a string_view can do the expensive 'upgrade' option on demand.)

Here's the quote from my comment above:

> a small string copied on to the stack will be part of the string struct, which is at least 3 * 8 = 24 bytes: a pointer, the length and the capacity. Also, a memcpy out of the original string is always going to be more expensive than just getting the pointer/length (or pair of pointers) for a string_view, since the memcpy has to do this anyway.

> Also, if you are using pointers you will have aliasing issues which will have an impact on performance. If you work by values you allow the compiler to optimize things better.

You do realise that a std::string contains pointers and so on inside it? Furthermore, the small string optimisation (copying to the stack) means every data access to a std::string includes an extra branch.

> For small strings string view are just dumb and "most of the time" strings are very small.

So instead of just having a cheap reference into a string you're happy with the overhead of a function call (memcpy) and a pile of dynamic branches? I wouldn't be surprised if the branches are the major performance burden for std::string-based code that is processing a pile of substrings of some parent string. In this case, the data from the string_views will normally be in cache anyway (i.e. it will've been recently read by the function that decides who to slice into the string_view).

> To give a better example of why working a string view is both a bad idea and dangerous, it's as if you said "I don't want to copy this vector, therefore I will work on iterators". That's obviously a bad idea.

It's not obviously bad to me. In fact, it seems very reasonable to work with iterators rather than copying vectors (isn't that exactly what the algorithm header does?).

If your problem is that it is unsafe and hard to avoid dangling pointers etc, that's just a fundamental problem of C++ and is unavoidable in that language. One fix would be to use Rust; it handles iterators and string_views safely.