|
You know what's nicer than delimiting beginnings and ends of things? Length prefixing. Protocol message formats and data encoding formats both already know what they're going to say before they say it, and so know its octet length. The only reason to use delimiters, ever, is for user-modifiable data (e.g. source code) where you might want to insert or delete characters and have the containing block remain valid. --- And now, a fun tangent, to prove that how deeply-rooted this confusion is in CS: user-modifiable data was originally the sole use-case for \0-terminated "C strings" in C. C has two separate types which get conflated nowadays: char arrays, and \0-terminated strings. Most "strings"--as we'd expect to find them in other languages--were, in C, actually char arrays: you knew their length, either because they were string literals and you could sizeof them, or because you had #defined both FOO and FOO_LEN, or because you had just allocated len bytes on the heap for foo, so you could just pass len along with foo. Because you knew their length, you didn't need to use the string.h functions to manipulate them. It was idiomatic (and perfectly-safe) C, when dealing with char arrays, to just iterate through them with a for loop. The concept of \0-termination, and thus what we think of as "C strings", only applied to string buffers: fixed-size, stack-allocated, uninitialized char arrays. The string.h functions are all meant to be employed to manipulate string buffers, and the \0 is intended to mark where the buffer stops being useful data, and starts being uninitialized garbage. The strings in string buffers had short lifetimes, and didn't usually outlive the stack frame the buffer was declared in. Generally, you'd declare a string buffer, populate it using some combination of string literals, strcat(3), sprintf(3), and system calls, and then pass the string--still sitting inside the buffer--to a system call like fstat(2) to get what you're really after. That would be the end of the both string buffer's, and the string's, lifetime. If you ever did want to preserve the contents of a string buffer into something you could pass around, though, this would be idiomatic: int give_me_a_path_string(char **out)
{
char buf[MAX_PATH];
/* ... */
int len = strlen(buf);
*out = memcpy(malloc(len), buf, len);
return len;
}
Note that, after this function returns, the pointer it has written to doesn't point to a "C string": instead, it's a plain pointer to a heap-allocated array of char, with exactly enough space to hold just those characters. If you want to know how big it is, you look at the return value.So: • C has "C strings", but they were only intended as buffers. • C also has "char arrays", which are really what you should think of as C's equivalent to a "string" datatype. char arrays, not "C strings", are the fundamental data structure for representing and persisting strings in C. • char arrays are less like "C strings" than they are like Pascal strings: they come in two parts, a block of memory N chars wide, and an int containing N. You don't examine the block to determine the length; the length is explicit. • Pascal (and thus most modern languages with strings) put both the length and the character-block on the heap as a unit. C puts the character-block on the heap, but puts the length on the stack. This is more efficient under C's Unix-rooted assumptions: you need the length on the stack if you want to work with it to immediately shove the string through a pipe. |