| HN Mirror

Y	Hacker News new \| ask \| show \| jobs

by arielby 3945 days ago

NUL-terminated strings aren't that bad:

* unlike Pascal-style strings, they can be usefully sliced, especially if you can modify them strtok-style.

* unlike (ptr,len) "Modern C buffers"/Rust-style strings, references to them are pointer-sized, and they can be used as a serialization format.

This makes the kind of application that is based on cutting pieces of a string and passing them around a good measure faster, especially compared to say C++'s "atomically reference-counted, re-allocating at the slightest touch" std::string.

This style of programming is not particularly popular nowadays, so buffer-strings are better-fitting. Its main problem is its multitude of edge-cases, which tend to demonstrate C's "every bug is exploitable" problem well.

2 comments

Strilanc 3945 days ago

> unlike Pascal-style strings, they can be usefully sliced, especially if you can modify them strtok-style.

Slicing Pascal-style strings is also easy and constant-time: just track the buffer, offset, and length of the slice of characters you want. Java used to do it implicitly whenever you called `substring`.

> unlike (ptr,len) "Modern C buffers"/Rust-style strings, references to them are pointer-sized, and they can be used as a serialization format.

Every C method that takes a character buffer either a) has a corresponding length parameter or b) is avoided because of the security risks. In practice this means that C also stores the length information, just on the side instead of combined into a struct with the buffer.

link

arielby 3945 days ago

> Slicing Pascal-style strings is also easy and constant-time: just track the buffer, offset, and length of the slice of characters you want. Java used to do it implicitly whenever you called `substring`.

That's just coercing into a "modern C buffer" and slicing it. It has the disadvantage that coercion is not equality or subtyping - i.e. you will have to do lots of wrappings and unwrappings in mixed code.

> Every C method that takes a character buffer either a) has a corresponding length parameter or b) is avoided because of the security risks. In practice this means that C also stores the length information, just on the side instead of combined into a struct with the buffer.

You are surely talking about the buffer's capacity, not the string's length. These are distinct concepts. Anyway, functions that only read strings, and structs that only store them read-only, aren't interested in the capacity of any buffer.

Anyway, C strings aren't responsible for the fixed-size buffers of Cold War-era code - that code uses fixed-size buffers for everything. Their main claim to fame is their popularity in parsing code, which is edge-case- and bug-prone.

link

tines 3945 days ago

std::string isn't reference-counted in a conforming implementation (that doesn't do atomic ops just for fun).

link

arielby 3945 days ago

well C++11 strings are just "reallocate when you look at them funny". Or you use shared_ptr and are back to square 1.

link