Hacker News new | ask | show | jobs
by Joker_vD 79 days ago
Yeah, I too feel that storing the array's length glued to the array's data is not that good of an idea, it should be stored next to the pointer to the array aka in the array view. But the thrall of having to pass around only a single pointer is quite a strong one.
2 comments

> I too feel that storing the array's length glued to the array's data is not that good of an idea, it should be stored next to the pointer to the array aka in the array view.

That’s not cache-friendly, though. I think the short string optimization (keeping short strings alongside the string length, but allocating a separate buffer for longer strings. See https://devblogs.microsoft.com/oldnewthing/20240510-00/?p=10... for how various C++ compilers implement that) may be the best option.

> That’s not cache-friendly, though.

How so? The string implementations in that post are pretty much that:

    struct string
    {
        char* ptr;
        size_t size;
        union {
            size_t capacity;
            char buf[16];
        };
The pointer and the size are stored together, and they may optionally be located right next to the string's actual data, but only for very small, locally-allocated, short-lived strings; but in normal usage, that pointer points somewhere into the heap.
> they may optionally be located right next to the string's actual data, but only for very small, locally-allocated, short-lived strings

Only for small strings. Locally allocated and short-lived aren’t required for short string optimization to take an effect.

Also, I can’t find a good reference, but “only for small strings” in many programs means “for most strings”.

Is there a reason for the string not to be a struct, so that you're still just passing around a pointer to that struct (or even just passing it by value)?
I might guess that GP is referring not to interface ergonomics (for which a struct is a perfectly satisfactory solution, as you describe), but to implementation efficiency. A pointer is one word. A slice / string view is two words: a length and a pointer. A pointer to a slice is one word, but requires an additional indirection. I personally agree that slices are probably the best all-around choice, but taking double the memory (and incurring double the register pressure, etc.) is a trade-off that's fair to mention.