Hacker News new | ask | show | jobs
by krackers 646 days ago
Isn't this tautologically saying the compiler does sane things if you define "sane" as what the compiler allows? As Linus said standards are written on toilet paper, in the real world you have a need to do signed overflow and type punning. In the real-world pretty much every system of note uses 2s complement, and if there was a concern about maintaining compatibility with archaic systems it could be made "implementation defined" behavior instead.

To get something approximating "sane behavior" you have to set a dozen flags to disable various types of questionable optimizations.

1 comments

No, C is totally sane for the machine it describes. If you want to describe a different machine, one in which objects can alias, where signed overflow has a specific meaning, etc, you can choose to describe that machine instead using compiler flags. In that case it is on you to understand what new, unstandardized machine you are describing. This is fine and good, nothing wrong with it, and often very useful.

The problem is with ever believing what you are describing with C has some 1-to-1 relationship with what the compiler is producing for a given real world hardware implementation. This delusion is unique to C. Fortran, Python, Java, Go, etc, programmers don't ever think about what the underlying hardware is doing with their code. They're writing code for an abstract machine defined by the language in implementation or standard.

> No, C is totally sane for the machine it describes.

Are brainfuck[1] or malbolge[2] sane? They do exactly what they says they will do, so following your logic they are sane, aren't they?

> The problem is with ever believing what you are describing with C has some 1-to-1 relationship with what the compiler is producing for a given real world hardware implementation.

I'm not sure, that it is possible to be a good C programmer and to not have any clue of what the compiler will produce. You need to know, for example, that if you pass something big by value it will lead to copying this value into the stack. You'd better have some idea how many registers compiler uses to pass arguments to functions, and how it uses them to pass structs. Or, as another example, you need to know how C packs structures, so you will not end up with a structure that uses several times more memory than it needs.

Besides of this you need to think of cache locality and of other concepts that are not mentioned in the specification of C abstract machine. And these things have much greater impact on the performance then the optimizations that become possible only when compiler allows itself to go crazy with UB.

[1] https://en.wikipedia.org/wiki/Brainfuck [2] https://en.wikipedia.org/wiki/Malbolge

> Are brainfuck[1] or malbolge[2] sane?

In that they are sound and consistent, yes, any esoteric language is sane. Merely esoteric.

> I'm not sure, that it is possible to be a good C programmer and to not have any clue of what the compiler will produce...

The compiler is under no requirement to pack structures in a specific way, or under any requirement to produce accesses in a cache pattern that mirrors your intent in the code. The compiler is required to produce a very specific set of observable behaviors, specifically:

> Volatile accesses to objects are evaluated strictly according to the rules of the abstract machine.

> At program termination, all data written into files shall be identical to the result that execution of the program according to the abstract semantics would have produced.

> The input and output dynamics of interactive devices shall take place as specified in 7.23.3. The intent of these requirements is that unbuffered or line-buffered output appear as soon as possible, to ensure that prompting messages appear prior to a program waiting for input.

And that's it. When thinking about the correctness of one's code, things like layout and cache access patterns are irrelevant because the abstract machine does not provide for such things. Thinking in that frame can only lead to errors.

Now, as a fully separate and apart lens to view one's code, of course performance matters. Of course struct layout (as guaranteed by ABI standards like SysV and Windows) matters, of course you should make the optimizer's job easier by doing sequential rather than scattered accesses. That's true of any programming language.

It's true for COBOL and Go and Ada, and yet when using those languages one does not try to reason about accessing objects via incompatible lvalues like C programmers often try to do. This machine-optimization frame of thinking is not a frame you can use to think about the behavior of the program with, behavior needs to be conceived in the frame of the abstract machine.