The strbuf library that's part of git.git is a pleasure to work with. It's C-string compatible (just a char /size_t pair), guarantees that the "char " member is always NULL-delimited, but can also be used for binary data. It also guarantees that the "buf" member is never NULL: https://github.com/git/git/blob/v2.34.0/strbuf.h#L6-L70
> WG14 could naturally work into something like SDS for strings and arrays, but of course that is out of their goals to ever do that.
Maybe it is, but even if it were, sds strings are a poor choice. I used them extensively in a private project.
1. Typedef'ing `sds` to a pointer type. This leaves no indication to the reader of code that any `sds` typed variable needs an `sdsfree`. IOW, for every other standard type it is clear when the data object needs a `free`, `fclose`, etc. This is a big deal, it's difficult to change the typedef for sds due to the way it returns pointers.
2. Not compatible with current string functions, strike 1: storing binary data in the strings, like the nul character makes it silently lose data when used with current string functions that accept `const char *`. This is a very big deal!
3. Not compatible with current string functions, strike 2: an sds string is only compatible with current string functions that take a `const char *`. This isn't such a big deal (for example, it provides a replacement for `strtok` as the standard `sds` type won't work for `strtok`) but it's unnecessarily incompatible.
4. With the current way it's exposed to a caller, you cannot use `const sds` variables anywhere, which removes a lot of compiler-checking. Trying to use `const` on any sds variable is pointless as you get none of the error-checking.
While sds solves many problems with raw C strings, those problems can be solved by adding standard library functions that work with existing C strings. In addition, it adds a few more problems of its own.
"C strings" really aren't anything worth talking about. People take them way too seriously and then complain that they are "unsafe" or "hard to use". Look, C gives you memory to work with and the rest is up to you. Almost the only thing you want from C with regards to strings is string literals.
It should be obvious that most "string" APIs from libc like strcat, strcpy, but especially strtok are ridiculously bad and are only in the libc because of history. Don't use them.
Even strlen() is rarely a good idea to use, and you can (should?) replace strlen("abc") by sizeof "abc" - 1.
My point regarding WG14 wasn't to add SDS as they are, rather vocabulary types for strings and arrays in the same spirit as SDS.
When they exist as vocabulary types, the ecosystem can rely on their existence and slowly adopt their use, similarly to threads support introduction in C11, for example.
> My point regarding WG14 wasn't to add SDS as they are, rather vocabulary types for strings and arrays in the same spirit as SDS.
Well, yes, I'd love to see some proper string support too, so at least we're in agreement about that :-)
But, overhauling C with additional (memory-safe) array types and string types that are nonetheless still compatible with legacy uses is probably a non-starter anyway. The only way forward would be to add a new type that isn't compatible, which is unpalatable to a lot of people (myself included).
Adding memory-safe functions and/or semantics is easier, but will probably not cover 100% of the memory-safety desired.
> When they exist as vocabulary types, the ecosystem can rely on their existence and slowly adopt their use, similarly to threads support introduction in C11, for example.
Threads, I feel, are a poor example for two reasons: 1) Hardly any code uses the `thread_t` type for a variety of reasons, and 2) There was no need for a `thread_t` type to be backward compatible with anything.
Apart from plain old fixed buffers, which is what is supported by C just fine and which covers 99% of string processing needs in the areas that C as a language is suited for anyway, ... there are 14 known ways of doing "strings" depending on circumstance, so I don't think it would be a good idea to introduce one mandatory version of them into the C standard. There is already C++ which has std::string, and there are a lot of GC'ed and scripting languages that are more suited for quick and dirty string processing.
That's not really a problem if the only thing they need is direct access to a read-only view of the buffer (i.e. const char*) - then it's no different than C++ and std::string.
You will have to define "good". My string library[1][2] is "good" for me because:
1. It's compatible with all the usual string functions (doesn't define a new type `string_t` or similar, uses existing `char *`).
2. It does what I want: a) Works on multiple strings so repeated operations are easy, and b) Allocates as necessary so that the caller only has to free, and not calculate how much memory is needed beforehand.
The combination of the above means that many common string operations that I want to do in my programs are both easy to do and easy to visually inspect for correctness in the caller.
Others will say that this is not good, because it still uses and exposes `char *`.
No? Asking for code nav and you get three answers. Asking for this and you get crickets. In the 90s I worked at a place where we embedded TCL into all the apps, and rolled our own templating systems. I had to do a little string stuff in C after few years of go, and it sucked. Ugg. buf[len] = ‘\0’;
Using go, I thought I was getting back to low level stuff but this C experience made me appreciate strings in Go. Web servers in C are crazy bad idea, especially if they are spitting out html. Lisp would be better. Node would be better. Go would be better.
So why didn't you use one of the bazillion library functions or third party libraries that terminate strings for you?
I feel like most of the criticism is coming from people who punish themselves by rejecting library functions and then telling that strings are hard. Doh.
I like to see the terminating nul in there, just in case my math was off earlier. I had strdup and so on. I was just way more work than go. And I was writing a LD_PRELOAD that I didn’t want to drag in any extra dependencies other than libc. Trust me, it is faster and safer and more fun in Go than C.
Alright, yea, if you're working under constraints like that, C sucks more than it has to.
It just doesn't feel fair to criticize C without mentioning that your experience comes from working under such unusual constraints. "Strings are hard, C sucks" is quite different from "strings are hard, C sucks when you can't rely on libraries." Also feels unfair to say you get crickets when you ask for a lib but actually you weren't even willing to use one. (There are tons of string libraries for C, it's impossible to miss them if you look around.)
Even then, if you're stuck working with only libc and your own code, there's a very high chance you're doing something wrong if you're doing math on strings and terminating them manually. There's a fair selection of libc functions that do all the math for you and will always output a properly terminated string if your inputs are a buffer & valid size and strings.