Hacker News new | ask | show | jobs
by Spivak 92 days ago
In C-ish languages the statement

    int x = "thing"
is perfectly valid. It means reserve a spot for a 32 bit int and then shove the pointer to the string "thing" at the address of x. It will do the wrong thing and also overflow memory but you could generate code for it. The type checker is what stops you. It's the same in Python, if you make type checking a build breaker then the annotations mean something. Types aren't checked at runtime but C doesn't check them either.
3 comments

In C, int may be as small as 16 bits You may get 32 bits (or more) but it's not guaranteed. I don't see how you get a memory overflow though?

I'd be surprised if a compiler with -Wall -Werror accepts to compile this.

Trying to cast back the int to a char* might work if the pointers are the same size as int on the target platform, but it's actually Undefined Behaviour IIRC.

I guess an overflow would be possible if the size of a point and int differs.
It's valid in C, due to semantics around pointers. Try that in Java and you'll quickly find that it's not valid in "C-ish languages". C absolutely checks types, it's just weakly typed. Python doesn't check types at all, which I wouldn't have a problem with, if the language didn't have type annotations that sure look like they'll do something.
It won't "overflow memory".

This says there will be an immutable array of six bytes, with the ASCII letters for "thing" in the first five and then the sixth is zero, this array can be coerced to the pointer type char* (a pointer to bytes) and then (though a modern C compiler will tell you this is a terrible idea) coerced to the signed integer type int.

The six byte array will end up in the "read only data" section of the executable, it doesn't "overflow memory" and isn't stored in the x. Even if you gave x a more sensible type "char*" that word "thing" isn't somehow stored in your variable, it's a pointer.

So, this isn't the same at all and you don't understand C as well as you thought you did.

Edited: fix escaping bold markers

I'm fairly certain that the C standard doesn't specify that string literals should be placed into .rodata, just that changing mutating them is UB.
That's true, the systems where C was created do not have the relevant features, and I would expect they can't even "protect" that text so that although it's UB it would have worked fifty years ago to attempt the mutation whereas today that will segfault on a Unix.
Compilers could choose to not place the literals into .rodata. Honestly, I don't know why it isn't at least an option on modern implementations.
I was talking about the int being 32 bits and the pointer being 64 bits but go off. If you did a naive codegen of this without type checking where the compiler just said "yes ma'am blindly copying the value to &x" then you would clobber adjacent memory. That's the point I'm making, you rely on the type checker to make the types actually mean things and give you safety guarantees.

It feels stronger is languages where you can't even produce a running program if type checking fails but it's conceptually the same.

Python does have strong types, it's just that it's dynamically typed - the variables don't have assigned types in Python itself (hence type annotations and third party type checking). C claims to have strong types but it is weakly checked and full of unwise coercions - however it is statically typed and so variables have types.

If you want to see a language which does not have types you want the predecessor of C, B.

Imagining into existence a variant of C where assignment causes arbitrary memory overwrites isn't about type checking, that's not a "naive codegen" it's nonsense. If that was your point then you didn't do a good job of communicating it and it's still wrong.

Where does C claim to have strong typing?
The quote about C being "strongly typed but weakly checked" is usually attributed to Dennis, one of C's co-creators. I am not able to pin it down to a recorded interview or written document but if you've used C you'll undoubtedly recognise the idea.