Hacker News new | ask | show | jobs
by pjmlp 1615 days ago
Yes, SDS from Redis project.

https://github.com/antirez/sds

However the moment you call into other C libraries, they naturally only expect a char *.

2 comments

I got tired of running into this problem and decided to simply eat the cost of using `char *` in my string library.
And that is why most such efforts eventually die.

WG14 could naturally work into something like SDS for strings and arrays, but of course that is out of their goals to ever do that.

> WG14 could naturally work into something like SDS for strings and arrays, but of course that is out of their goals to ever do that.

Maybe it is, but even if it were, sds strings are a poor choice. I used them extensively in a private project.

1. Typedef'ing `sds` to a pointer type. This leaves no indication to the reader of code that any `sds` typed variable needs an `sdsfree`. IOW, for every other standard type it is clear when the data object needs a `free`, `fclose`, etc. This is a big deal, it's difficult to change the typedef for sds due to the way it returns pointers.

2. Not compatible with current string functions, strike 1: storing binary data in the strings, like the nul character makes it silently lose data when used with current string functions that accept `const char *`. This is a very big deal!

3. Not compatible with current string functions, strike 2: an sds string is only compatible with current string functions that take a `const char *`. This isn't such a big deal (for example, it provides a replacement for `strtok` as the standard `sds` type won't work for `strtok`) but it's unnecessarily incompatible.

4. With the current way it's exposed to a caller, you cannot use `const sds` variables anywhere, which removes a lot of compiler-checking. Trying to use `const` on any sds variable is pointless as you get none of the error-checking.

While sds solves many problems with raw C strings, those problems can be solved by adding standard library functions that work with existing C strings. In addition, it adds a few more problems of its own.

"C strings" really aren't anything worth talking about. People take them way too seriously and then complain that they are "unsafe" or "hard to use". Look, C gives you memory to work with and the rest is up to you. Almost the only thing you want from C with regards to strings is string literals.

It should be obvious that most "string" APIs from libc like strcat, strcpy, but especially strtok are ridiculously bad and are only in the libc because of history. Don't use them.

Even strlen() is rarely a good idea to use, and you can (should?) replace strlen("abc") by sizeof "abc" - 1.

My point regarding WG14 wasn't to add SDS as they are, rather vocabulary types for strings and arrays in the same spirit as SDS.

When they exist as vocabulary types, the ecosystem can rely on their existence and slowly adopt their use, similarly to threads support introduction in C11, for example.

> My point regarding WG14 wasn't to add SDS as they are, rather vocabulary types for strings and arrays in the same spirit as SDS.

Well, yes, I'd love to see some proper string support too, so at least we're in agreement about that :-)

But, overhauling C with additional (memory-safe) array types and string types that are nonetheless still compatible with legacy uses is probably a non-starter anyway. The only way forward would be to add a new type that isn't compatible, which is unpalatable to a lot of people (myself included).

Adding memory-safe functions and/or semantics is easier, but will probably not cover 100% of the memory-safety desired.

> When they exist as vocabulary types, the ecosystem can rely on their existence and slowly adopt their use, similarly to threads support introduction in C11, for example.

Threads, I feel, are a poor example for two reasons: 1) Hardly any code uses the `thread_t` type for a variety of reasons, and 2) There was no need for a `thread_t` type to be backward compatible with anything.

For full memory safety with C the only option are the C Machines, meaning hardware memory tagging.

Already in use for a decade in Solaris SPARC, and eventually mainstream across all variations of ARM CPUs.

Unfortunely Intel botched their MPX implementation and now it is gone.

Apart from plain old fixed buffers, which is what is supported by C just fine and which covers 99% of string processing needs in the areas that C as a language is suited for anyway, ... there are 14 known ways of doing "strings" depending on circumstance, so I don't think it would be a good idea to introduce one mandatory version of them into the C standard. There is already C++ which has std::string, and there are a lot of GC'ed and scripting languages that are more suited for quick and dirty string processing.
The fact that C++ was able to eventually standardize on a single string type (despite the same mess of many dozens of incompatible implementations) shows that it is possible and desirable. It's not like raw buffers will go anywhere if you add a higher-level type. Nor does it have to be perfect - only "good enough" for use across the API surfaces of various libraries.
That's not really a problem if the only thing they need is direct access to a read-only view of the buffer (i.e. const char*) - then it's no different than C++ and std::string.