Hacker News new | ask | show | jobs
by flohofwoe 1779 days ago
Zero terminated strings make sense when you consider the pecularities of the PDP instruction set.

Snippet from "https://dave.cheney.net/2017/12/04/what-have-we-learned-from...":

One can write a string copy routine using two instructions, assuming that the source and destination are already in registers.

    loop:   MOVB (src)+, (dst)+
            BNE loop
The routine takes full advantage of the fact that MOV updates the processor flag. The loop will continue until the value at the source address is zero, at which point the branch will fall through to the next instruction. This is why C strings are terminated with zeros.
1 comments

Except the world stop being a PDP-11 around 1980's, while ISO C refuses to update itself to modern times.

The biggest issue with C isn't its footguns, rather the WG14 unwillingness to provide additional language or library features that would allow for a safer C outside the low level code where it pretends to be a portable macro assembler.

In the early years the job of the C committee wasn't to improve C, but merely to harmonize existing C implementations, and by the end of the 80's it was already too late, zero-terminated strings had already been baked into operating system APIs (e.g. zero-terminated strings are no longer primarily a language problem, but an ABI problem).

Besides, the x86 "repeat while" string instructions continued the PDP legacy.

That argument sounds akin to "Well people are used to driving without seatbelts, it'd be painful to make them switch now."

Or like my spouse likes to joke when we get in the car to leave and forgot something in the house: "it's too late, the door's shut."

I get why null terminated strings once existed. It's baffling that they continue to exist 50 years later. Not to mention, they don't even work on data buffers, so you need fat pointers anyways!

Replacing the string memory layout in operating system APIs is more similar to changing the track width on an existing railroad network, across the whole world.

It may be a good idea from a theoretical standpoint, but once you start calculating the cost it simply doesn't make sense.

Obviously not with that attitude. I'm not even talking about back-porting. I'm saying going forward, in new drivers and extensions, and new growth where it makes sense to do so.

There's even plenty of means for backwards-compatible strings and arrays, such as sds.

But really the point was more to "this should have really been addressed decades ago."

https://github.com/antirez/sds

Liability and lawsuits due to security exploits will take care of that.

Thankfully they are starting to pick up.

We aren't speaking about ANSI/ISO C89 here, rather in what happened the following 32 years.
That doesn't matter, even C doesn't matter in that regard anymore, or the opinion of any other language on the best string memory layout. Once zero-terminated strings leaked into operating system APIs the damage was done forever, and there never was a good time to fix the problem afterwards.
That is no excuse not to try to improve the status quo.

Maybe liability and lawsuits are really needed to stop with such excuses.

I think some incremental improvements could be done for C, question is if it's worth the time?

Maybe it's better to just use a better tool for new code bases?

Except embedded developers and UNIX clones will never move beyond C.