Hacker News new | ask | show | jobs
by delta_p_delta_x 79 days ago
> C Strings at least let you take a zero copy substring of the tail

This is a special-case optimisation that I'm happy to lose in favour of the massive performance and security benefits otherwise.

Isn't length + pointer... Basically a Pascal string? Unless I am mistaken.

I think what was unsaid in your second point is that we really need to type-differentiate constant strings, dynamic strings, and string 'views', which Rust does in-language, and C++ does with the standard library. I prefer Rust's approach.

3 comments

If I recall correctly a pascal string has the length before the string. Ie to get the length you dereference the pointer and look backwards N bytes to get the length. A pascal string is still a single pointer.

You cannot cheaply take an arbitrary view of the interior string - you can only truncate cheaply (and oob checks are easier to automate). That’s why pointer + length is important because it’s a generic view. For arrays it’s more complicated because you can have a stride which is important for multidimensional arrays.

> Isn't length + pointer... Basically a Pascal string? Unless I am mistaken.

Length + pointer is a record string, a pascal string has the length at the head of the buffer, behind the pointer.

Many years ago when reading Redis code I saw the same pattern: they pass around simple pointer to data, but there is a fixed length metadata just before that.
I assume it’s either Antirez’s sds or a variant / ancestor thereof, yes. It stores a control block at the head of the string, but the pointer points past that block, so it has metadata but “is” a C string.
Pascal strings store the string's length by its data, whereas fat pointers store the length by the address of the data.

The main difference is that if a string's length is by its data, you can't easily construct a pointer to part of that data without copying it into a new string, whereas if instead the length is by the data's address, you can cheaply construct pointers to any substring (by coming up with new length+address pairs) without having to construct entire new strings.