Hacker News new | ask | show | jobs
by kazinator 1709 days ago
C is not "barely typed". It has quite a lot of type checking.

An expression like "obj.memb" in C requires obj to be declared to have a type which has a member "memb".

C catches it if you call a function with the wrong number of parameters, or wrongly typed parameters, such as passing a "struct foo *" pointer where a "struct bar *" argument is required.

C has "holes" in the static safety net in areas like memory safety: object boundaries and lifetimes. It allows some unsafe conversions, like any object pointer to a void * and back. But not only those: C has unsafe numeric conversion and operations.

Still, there is a type system there, and C programs greatly profit from it; it's the big reason why we have so many lines of C code in our computing infrastructure, yet the proverbial sky isn't falling. (Just the odd lightning or hail here and there.)

C compilers also help; modern compilers have a lot more diagnostic power than compilers thirty years ago. In C, it is critically important to diagnose more than the bare minimum that ISO C requires. For instance, whereas a function that has not been declared can be called in any manner whatsoever (any number of arguments), it's a bad idea to do that without issuing a diagnostic about an undeclared function being used. If such a diagnostic isn't enabled by default it's a bad idea not to add that. C programmers have to understand the diagnostic power of their toolchain.

Recently, GCC 11 found a problem in some code of mine. I had converted malloc/free code for a trivial amount of memory to use alloca. But somehow I left in a free call. That was not diagnosed before, but now it was diagnosed.

Another obscure bug that a newer compiler with newer diagnsotics caught for me in the last few years was a piece of code where a comparison like this was being made:

   d <= UINT_PTR_MAX
where d is a double. The idea was to try to check whether d is in the range of a certain integer type before converting it. Trouble is that the above expression moves the goalpost because when UINT_PTR_MAX is 64 bit, then its value is not necessarily representable in the double type. What happens is UINT_PTR_MAX is converted to double, and in that process it goes to a nearby double value which happens to greater than UINT_PTR_MAX! And so then the range check becomes wrong: it includes d values in that extended range, which are beyond the range of that integer type, causing undefined behavior in the conversion.
2 comments

In the field of formal type systems two common approaches to defining types, in practice they are quite similar but in my opinion they differ a lot in framing.

One side can be represented by Haskell, Hindley–Milner type systems, or even Coq; here every value has its own "best" type that is intrinsecally associated with it, that is values and types are defined and constructed together.

On the other side you have sort of a formal definition of duck-typing; you have values and properties that are satified by some set of values, here you have your values (all numbers, all strings, all memory addresses) and expres in usual logic terms any property you want (e.g. this memory address must be either Null or point to a string of even length).

All this to say that C has a nice type system from the first point of view (function pointer allow you to have higher order functions!) but a very weak one from the second point of view in that it is very hard to decide if an operation will have a valid result just by the types of the values you feed into it (let's not talk about UB for now).

In my opinion in later decades there is a movement to care more about type systems that follow the second approach. In my opinion it is one of the reason for the success of Typescript; its objective wasn't to have a nice type system full of good properites, but to model how javascript was being written.

C is a statically, but weakly typed language with very few compile time checks and many non-intuitive automatic casts.