Hacker News new | ask | show | jobs
by nitrogen 6024 days ago
Other than strlen, string operations can be faster with null-terminated strings. Plus, there are other benefits:

If you want to turn some arbitrary data into a string, just put a 0 byte where you want it to end.

Instead of having to increment and compare both a counter and a pointer (or store a final pointer for comparison) in string-manipulation operations, you just increment the pointer. It makes for very concise loops in strchr, etc.

Tokenizing a string is just a matter of throwing down 0 bytes where the tokens are (as strtok does).

Passing a tailing subset of a string to a function is as easy as adding an integer to the string pointer, rather than requiring a memcpy and length calculation.

2 comments

Null-terminating arbitrary data to turn it into a string is great--if your arbitrary data doesn't have any 0 bytes in it naturally.

Strtok is just as easily handled by handling strings as a 2-item struct: a size_t for length and a pointer (separating the size data from the character array). In fact, that would kick ass: you can non-destructively tokenize, pass around substring arguments, etc.

Storing size separately from the character data, rather than as a prefix to the character data, would indeed alleviate some of the potential problems of a size+data string. I could see myself using such an implementation in situations where the minimal overhead of allocating a separate area on the stack or heap for the size+pointer struct is negligible. Expanding strings would certainly be a lot easier, and it would open the posibility (as you mentioned) for complex data sharing among strings, similarly to Qt4.
On modern processors simple arithmetic like incrementing is effectively free. Control dependence is expensive. Data dependence is somewhere between. Fat pointers do not add any dependence. Storing the end has little cost given shadow registers.

If this had been adopted 20 years ago the likely outcome would be that cpus might have fractionally more integer resources. It wouldn't affect the wallclock speed of any code.

All of the operations you describe are done just as easily with fat pointers with the additional advantage of being applicable to truly arbitrary data, not just null clean strings.