Hacker News new | ask | show | jobs
by pjmlp 1615 days ago
My point regarding WG14 wasn't to add SDS as they are, rather vocabulary types for strings and arrays in the same spirit as SDS.

When they exist as vocabulary types, the ecosystem can rely on their existence and slowly adopt their use, similarly to threads support introduction in C11, for example.

2 comments

> My point regarding WG14 wasn't to add SDS as they are, rather vocabulary types for strings and arrays in the same spirit as SDS.

Well, yes, I'd love to see some proper string support too, so at least we're in agreement about that :-)

But, overhauling C with additional (memory-safe) array types and string types that are nonetheless still compatible with legacy uses is probably a non-starter anyway. The only way forward would be to add a new type that isn't compatible, which is unpalatable to a lot of people (myself included).

Adding memory-safe functions and/or semantics is easier, but will probably not cover 100% of the memory-safety desired.

> When they exist as vocabulary types, the ecosystem can rely on their existence and slowly adopt their use, similarly to threads support introduction in C11, for example.

Threads, I feel, are a poor example for two reasons: 1) Hardly any code uses the `thread_t` type for a variety of reasons, and 2) There was no need for a `thread_t` type to be backward compatible with anything.

For full memory safety with C the only option are the C Machines, meaning hardware memory tagging.

Already in use for a decade in Solaris SPARC, and eventually mainstream across all variations of ARM CPUs.

Unfortunely Intel botched their MPX implementation and now it is gone.

Apart from plain old fixed buffers, which is what is supported by C just fine and which covers 99% of string processing needs in the areas that C as a language is suited for anyway, ... there are 14 known ways of doing "strings" depending on circumstance, so I don't think it would be a good idea to introduce one mandatory version of them into the C standard. There is already C++ which has std::string, and there are a lot of GC'ed and scripting languages that are more suited for quick and dirty string processing.
The fact that C++ was able to eventually standardize on a single string type (despite the same mess of many dozens of incompatible implementations) shows that it is possible and desirable. It's not like raw buffers will go anywhere if you add a higher-level type. Nor does it have to be perfect - only "good enough" for use across the API surfaces of various libraries.
Just because it's possible to standardize on a string type in C it doesn't mean it's desirable. Also consider that it's not possible to copy C++'s string type because its ergonomics build heavily on RAII.

'const char *' arguments work just fine as parameters in libraries, and I don't see much of a use case (and insteaad more hazards) for a library that "resizes" a string argument destructively (like std::string does). The typical way to go about this is for the library to make a copy of the input string. On API boundaries, for memory that is needed longer than the function call lifetime, it is almost always an excellent idea to simply copy it. For data that doesn't make sense to copy (be it because of size or because only one side really needs it), the data should instead simply be created on the right side of the fence from the beginning.

I don't see myself needing a standardized string type because I'm not passing around string "objects", or concatening them, like it would be done in quick and dirty scripts. I honestly can't recall where that kind of thing would have been a good idea for my work in the last couple of years, and I'm much in favour of not growing the standard out of proportion. As said, if you desire C++ kind of ergonomics and want to solve more scripting-like tasks, there is already C++ and a ton of other languages.

What I can recall is skimming through a lot of C projects over the years that tries to do object-oriented and scripting type programming in C (often it wanted to be C++ or Java or even Python but it had to be C for some external reason), and that code is always, invariably, an unmaintainable mess where it's impossible to have a level of confidence that there are no memory errors and leaks. C is simply not suited for that style of programming.