Hacker News new | ask | show | jobs
by ChrisRR 1162 days ago
I'm a C programmer by day and I disagree with them, C is only simple if you're trying to do simple tasks with it.

The very common thing I want to use in C is some sort of variable size string object. But no, I have to dynamically allocate a buffer that I know will be at least the right size for any text I ever put into it, or do I create a buffer that's the correct size for that string but re-alloc if I ever change it to a longer string. But then how do I store the buffer size? Do I want to create a struct that constains a point to the buffer and the length, or use sizeof() to calculate the string length? But then I can't use sizeof() if I pass that buffer into a function via a pointer. If I pass that string to a function is it being copied straight away or just storing the pointer so I can't change the string at a later date. I can't enforce copy semantics

And god forbid you ever forget to include space for the NULL

I just want a string I can dump some text in, I don't want to go searching for libraries, I don't want to have to consider allocation and copying and all that crap. If I wrote half of my boilerplate C code in python it would look just as simple and beautiful (if not more)

4 comments

>I'm a C programmer by day and I disagree with them, C is only simple if you're trying to do simple tasks with it.

C gives you enough rope to shoot yourself in the foot. And rightfully so. It came out in a time when everyone was coding assembly. It's meant not to hold you back from doing voodoo with low-level stuff, therefore it won't hold your hand.

Not very practical in the world of today when we've been spoiled by 'better' languages and you need to quickly ship stuff that mostly works without worrying about the little things, but at the time it was revolutionary.

> when everyone was coding assembly..

This was before my time, but I think it's a common misconception (only true for operating system development). When C was created, there was already Lisp, Cobol, Fortran, Algol, Simula, BASIC... and SmallTalk and Prolog were just around the corner - and most of those are much higher level than C).

Sure, but none of those were for systems programming which is squarely the domain that C was aimed at, case in point: the first thing that C was used to write was UNIX (before then it was BCPL and this was iirc before C even had structs which made that a very tricky job, once structs were in place it got a lot easier). Probably Don Hopkins has more knowledge about this.
> before C even had structs which made that a very tricky job...

...interesting that you mention that, I think that functions and structs are the essential 'core abstraction tools' that get you to at least 80% of any higher level abstractions that were invented since then, and this is exactly the reason why C is still quite popular. Its feature set is just enough to be considered a high level language which enables abstractions, but not more (especially no fads and fashions that came and disappeared again).

C23 has lots of stuff into it, except improved safety.
There were enough languages for systems programming outside Bell Labs, all the way back to 1958.

It is a urban myth that C was the first one, usually pushed by naturally UNIX folks.

JOVIAL, ESPOL, NEWP, PL/I, PL/S, PL.8, PL/M, Bliss, Mesa, Modula-2, VMS Basic, VMS Pascal,...

I think parent knows. A more accurate description would be everyone of the intended audience was writing assembly. Yes there are other languages of higher levels, but C was not invented to help their users. And since they are also not really what Rust targets either, IMO it’s reasonable to shorten it to drop the qualifier in this context.
C was invented in the context of porting UNIX, while the world outside Bell Labs used something else, that is all there is to it.
> C gives you enough rope to shoot yourself in the foot.

If this was intended to illustrate the unexpected consequences of undefined behavior it succeeded remarkably well!

Undefined Behaviour was a very late 'addition' to C, it only became necessary when C was standardised around 1990. And only after two more decades passed, UB became an actual problem when compiler vendors decided that it's fine to exploit it for optimization tricks.
“Decided that it’s fine to exploit it for optimisation tricks” is a poor characterisation. The reality is, if you define particular behaviours you will harm performance in some cases. If you define how something in particular should happen, then all architectures will need to implement that, regardless of their underlying semantics.

eg. C leaves the case of exceeding the size of an int undefined. In most cases it has a predictable effect on modern, mostly similar architectures but that is by no means guaranteed, and forcing an architecture to calculate overflow a particular way seems like a negative.

That being said, everyone has a pet example of a compiler doing some really odd and deep optimisations - I suspect that’s mostly due to successive layers and optimisers adding up to have unexpected effects, rather than a deliberate effort by compiler writers - but I’m no expert on the matter.

> UB became an actual problem when compiler vendors decided that it's fine to exploit it for optimization tricks.

Section 4. Conformance says "A strictly conforming program shall only use those features of the language and library specified in this International Standard. It shall not produce output dependent on any unspecified, undefined, or implementation-defined behavior, and shall not exceed any implementation limit."

Compilers are not allowed to produce output dependent on UB for strictly conforming ISO C programs, they must optimize those statements out. Treating UB as impossible is required for ISO C. It's NOT required for GNU C, or Clang C, or Microsoft Visual C, but they usually do so anyway (even though they're not compiling strictly conforming ISO C programs).

Did you miss their point? They merely used UB, correctly, as a term we all do recognize today.

I will utterly kill all humor by explaining it:

They made a funny observation about a thing that happens. The thing that happens is (today) called UB. The funny observation is that that comment kind of exhibited the outward appearance of what the effects of UB could look like.

It began reciting one metaphore, "enough rope to hang yourself" but mid-way unexpectedly switched to a different metaphore "shoot yourself in the foot", producing a combined invalid nonsensical output. As though a program suffered some UB in the routine for looking up and printing metaphores.

The comment author might have done it on purpose. Maybe they intended to make exactly that joke.

The history of the term UB has no more bearing than the history of any of the other words used.

>It came out in a time when everyone was coding assembly

Expect to hear from @pjmlp on this!

>The very common thing I want to use in C is some sort of variable size string object. But no, I have to dynamically allocate a buffer that I know will be at least the right size for any text I ever put into it, or do I create a buffer that's the correct size for that string but re-alloc if I ever change it to a longer string. But then how do I store the buffer size?

As a C programmer shouldn't you have a library abstracting all this by now? Either your own or one of the dozens available, including pascal-style strings?

Okay, so you write your own string library. Now you'd like to do the same thing for resizable arrays, so you write a resizable array li… oops, you can't, because C doesn't have parametric types.
He didn't say "write your own library". He said "use" one (your own if you prefer). Or are you going to suggest there are no good string handling libraries for C?
It doesn't matter. My point is that it's not possible, not for you and not for anyone else, to implement a type-safe generic resizable array library in C.
Preprocessor macros are an entirely valid tool in the C language toolbox, even if demonized by C++ coders.
Even Macro Assemblers from the MS-DOS and Amiga days are better than C pre-processor macros.
it is black magic, absolutely respect C developers.
But the preprocessor is just simple text replacement, nothing magic about it.
My c knowledge is obviously old. I wonder if the author is lamenting the lack of a "modern" string manipulation in the standard library beyond just working on char buffers?
It's funny how both C and LISP programmers seem to suffer from NIH to the point that they'll roll their own just for the heck of it rather than to first see if there is a library that they can use.

The long term cost of those decisions as well as the number of really bad bugs (and security issues) that can be traced back to one-off code is likely much larger than the same figure for well used libraries. But it all sort of evens out whenever a bug in such a library is found because then it is so widespread that lots of systems will suffer.

Weird how that works.

>It's funny how both C and LISP programmers seem to suffer from NIH to the point that they'll roll their own just for the heck of it rather than to first see if there is a library that they can use.

Well, not sure about the LISP programmers, but C programmers have a good reason: they work under different environments (from embedded to Windows, legacy UNIX, the latest Ubuntu, ...) and also have different needs, regarding allocation, string management, etc. So one-size-fits-all lib might not cut it for everybody. It can also be as simple as having an inherited codebase which uses something else.

Still, there are popular string libraries, and C programmers do use them when they can.

What exactly is a problem with C++’s strings, or especially Rust’s? Everything you mentioned can be controlled as explicitly as you want.

The only problem is C is simply not expressive enough to have proper abstractions like that.

I don't even know where to start with C++ stdlib string problems, but being mutable and doing a unique heap allocation (above a certain length - a behaviour which however isn't even standardized) are definitely at the top of the list.

std::string_view would have been a good thing if it hadn't added another memory corruption foot gun.

A universal string type is one of those things where you can either have convenience or performance, but never both.

They are still better than str...() and mem...() all over the place, and all relevant C++ compilers have options to turn bounds checking on.
Last time I checked "Formatting is Unreasonably Expensive for Embedded Rust" (1)

And/or you need a crate like alloc, heapless, etc.

For the sake of a sprintf "abstraction"...

(1) https://jamesmunns.com/blog/fmt-unreasonably-expensive/

It’s not really a sprintf abstraction.
If all you do is write code on Linux/Windows/MacOS, C++ strings might be fine. Things are different in the wider world. Many places I use C don't even have a C++ compiler (embedded, in particular).
>What exactly is a problem with C++’s strings, or especially Rust’s? Everything you mentioned can be controlled as explicitly as you want.

Everything else, however, cannot.

Like what? C++ and Rust can do everything C can.
I have something like this that serves me well:

  struct strbuf { size_t cap, len; char *str; };
  void sb_setf(struct allocator *a, struct strbuf *sb, const char *fmt, ...);
  void sb_appendf(struct allocator *a, struct strbuf *sb, const char *fmt, ...);
  // have other convenience functions for formatting fixed point values like "prefix AAA.BBB suffix" ("voltage: 7.23 V")
  // special helpers for dates, times, etc.
Just keep building that library up and you'll have growable buffer, strings, lists, hashmap (uintptr -> uintptr is all you need in 99% of cases I've found, maybe some helper functions for string key -> void * built ontop of uintptr->uintptr) + replace/rewrite the standard library to operate on these types instead and you're good to go.
That works until you want to use code from somebody else who also has something like that that serves them well, but is slightly different.

If you’re extremely lucky, things will compile and work.

If you’re just lucky things won’t compile, and you’ll have to write conversion functions (or macros).

If you’re unlucky, there will be subtle differences, likely poorly documented, between the libraries, and code will compile but have subtle bugs.

Of course, other languages have that problem, too, but at the higher level of json parsers or graphics libraries, not at the basic level of strings, lists, or maps.

Could even use something like Gnome's GLib
> But then I can't use sizeof() if I pass that buffer into a function via a pointer.

Obviously, this isn't necessarily an improvement, but in theory you could do this:

  size_t getlength( int m, char (*p) [m] ) { return sizeof *p; }
where you'd be calling it like this:

  char mystring[] = "Hello";
  getlength( sizeof mystring, &mystring )
But then you're back in "having to pass the length as a separate parameter" territory, I guess. (but at least it's the length of the array here, not just the zero-terminated component, which is what you wanted).