Hacker News new | ask | show | jobs
by sirwhinesalot 812 days ago
It's impossible to avoid "sanitizing" input if you have a conversion step from a library provided char* to a strbuf type. Any use of the strbuf API is guaranteed to be correct.

That's very different from needing to be on your toes with every usage of the strxcpy family.

1 comments

> It's impossible to avoid "sanitizing" input if you have a conversion step from a library provided char* to a strbuf type. Any use of the strbuf API is guaranteed to be correct.

I agree: having a datatype beats sanitising input (I think there's a popular essay somewhere about parsing input vs sanitising input which makes pretty much the same point as you do), but it's still only partially correct.

To get to fully correct you don't need a new string type, you need developers to recognise that the fields "Full Name" and "Email address" and "Phone number", while all being stored as strings, are actually different types and to handle them as such by making those types incompatible so that a `string_copy` function must produce a compilation failure when the destination is "EmailAddressType" and the source is "FullNameType".

Developers in C can, right now, do that with only a few minutes of extra typing effort. Adding a "proper" string type is still going to result in someone, somewhere, parsing a uint8_t from a string into a uint64_t, and then (after some computation) reversing that (now overflowing) uint64_t back into a uint8_t.

If you're doing the right thing and creating types because "Parse, Don't Validate", a better string type doesn't bring any benefits. If you're doing the wrong thing and validating inputs, then you're going to miss one anyway, no matter the underlying string type.

Sure but now we're talking about a universal problem across languages, rather than a C-specific problem.
> Sure but now we're talking about a universal problem across languages, rather than a C-specific problem.

Of course, but that's my point - C already gives you the ability to fix the incorrect typing problem, using the existing foundational `str*` functions.

A team who is not using the compiler's ability to warn when mixing types are still going to mix types when there is a safe strbuf_t type.

The problem with the `str*` functions can be fixed today without modifying the language or it's stdlib.

Most C programmers don't do it (myself included). I think that, in one sense, you are correct in that removing the existing string representation (and functions for them) and replacing them with len+data representation for strings will fix some problems.

Trouble is, a lot of useful tokenising/parsing/etc string problems are not possible in a len+data representation (each strtok() type function, for example, needs to make a copy of what it returns) so programmers are just going to do their best to bypass them.

Having programmers trained to create new string types using existing C is just easier, because then you solve the whole 'mixing types' problem even when looking at replacements for things like `strtok`.

Or ... maybe I'm completely off-base and the reason that programmers don't create different types for string-stored data is because it is too much work in current C-as-we-know-it.