Hacker News new | ask | show | jobs
by Dwedit 552 days ago
The problem with string views is that they are borrowing the parent string, so you'd need to hold a strong reference to the parent string. This is easy to do in a garbage collected language, because you don't have to do anything. But it's a lot more complicated if you need to do this with reference counting. Do you make every single string view update the reference counter? Do you make a special lighter string view that doesn't keep a counted reference, and is subject to memory safety issues?
2 comments

Yep, you're right. One way to make this less of a problem is to make this distinction at the type level, having both an owned_string and a string_view for example. You can even make owned_string store its length inline.
These are regular questions in languages with (and without) reference counting, what’s so special about string views?
Typically you need 4 pointers to represent a strong reference count for a string view.

* One for the start of the source string, with an inline strong count * One for the end of the source string so you know how much to deallocate (only really applicable to Rust) * One for the start of the view * One for the end of the view

32 bytes for each string view is quite a lot. Depending on context you could use 32bit lengths instead of end-pointers if you're OK with <4GB strings, saving 8 bytes.

There's basically no distinction between a string view and an array slice. It's borrowing an array, and the view is nothing but a reference to the parent, start position, and length.

But views are also implemented as a plain pointer and a length, and that's where the memory safety issues from borrowing begin.

I understand the concern, but can’t you just maintain an actual reference field to parent_str in a string view? Unless I missed some no-extra-fields constraint itt, then sorry for the noise.
Let's say you took a document as a string, and split it up into words using a lot of string views. Every string view created would affect the reference count of the parent string. Then every time you work with the string views, saving temporary instances, passing them to a function, assigning them, whatever, you're affecting the parent string's reference count.

And reference counts are often atomic integer operations, so it might not be a regular memory increment, instead it would be an interlocked increment. And if there's multiple threads, the CPU cores will be competing over who gets to keep the reference counter in their L1 cache line. (There is a way around this where you can give threads an their own reference counter)