Hacker News new | ask | show | jobs
by gliese1337 4568 days ago

    allocation of the RString (which becomes larger and thus more difficult to malloc), and the 23 string bytes that will sit unused for longer strings.
I got the distinct impression (backed up by an actual code snippet defining the max embedded string size) that the 23 byte limit was calculated to exactly match the size of the data that would otherwise have to be stored for a heap string anyway. Thus, it doesn't actually take any extra space in the struct, and those 23 bytes do not go unused in other strings.
1 comments

You're exactly right on the union: I tried to edit out my error on that before someone noticed (my principal point is about the mallocs), but your comment appeared right as I saved. Shame be upon me.

So this holds (on a 64-bit machine) 8-bytes for the pointer, 8-bytes for the length (string not null terminated), and 8-bytes for the capacity. Alternately, via a union, it stores 24 bytes of string (null terminated). It knows whether it is a or b via a separate flag that it holds separately in RBasic.

I retract my jab about memory loss, but it still sounds rather terrible. Every bit of code dealing with strings needs to validate flags on every use to determine what it is dealing with, alternate between length specified or null terminated, etc. Ugh.

Look up a C++ string implementation sometime, and you'll find that this is almost exactly how most efficient C++ string implementations do this too.

And it doesn't need to alternate between length specified or null terminated - they're all length specified (or how do you think short Ruby strings are also able to store ASCII NUL)