Hacker News new | ask | show | jobs
by RyanZAG 3897 days ago
Umm wait. Undefined behavior is where the language specification is not 100% precise, and compiler implementations can differ on produced code.

Go and Rust only have single implementations. The specification for both are very brief. Are you claiming that a clean box implementation of Go and Rust would always behave identically?

I have only one thing to say: I clicked through the Golang spec for 30 seconds and found this: https://golang.org/ref/spec#Run_time_panics

> The exact error values that represent distinct run-time error conditions are unspecified.

Oh, what's that? Undefined behavior? In the golang spec?!

3 comments

Your definition of undefined behaviour is actually the definition for unspecified behaviour.

Unspecified behaviour is usually intentional ambiguity either to give wiggle room for an optimizer, or to accommodate platform variance. Writing a program with that invokes unspecified behaviour isn't normally a problem, as long as you're not relying on a specific result. Order of argument evaluation is a common example.

Relying on undefined behaviour is almost always bad, and almost always avoidable. That's where the nasal demons come from. Dereferencing null, alias-creating pointer typecasting, etc.

Undefined behavior has a very specific meaning for C. It doesn't mean "not 100% precise". It means 100% imprecise. The C standard only gives any guarantees on what your program will do if you never invoke UB. If you do, well then it can do literally whatever it wants including deleting all your files.
Literally deleting all your files at that. It's the part of the language that says your compiler is completely justified in allowing you to write that buffer overflow vulnerability that can trample executable memory, which is then used by a malicious attacker to do literally whatever they want, including deleting all your files. Or worse.
No, "undefined behavior" is a term with a specific meaning, because it is used for a very specific purpose. "Undefined behavior" means that a compiler can assume that a particular scenario will never happen in correct code, and therefore if it ever thinks it has to care about that scenario, it can in fact ignore it for the purpose of optimization.

For instance, if you have a bool in C++, the only defined behavior is for it to contain 0 or 1. If you somehow force the memory cell to contain 2 or 255 or anything else, you have triggered undefined behavior. This means the compiler never has to check for it. If it's faster for the compiler to implement, say, `if (b) x[1] = a; else x[0] = a;` as `x[b] = a`, it can do that. Even though the `b == 2` case might lead to a buffer overflow, the UB rules means that, as far as the compiler cares, the `b == 2` case cannot exist. If `x.operator[]` is a function that does a bounds check, the compiler can inline it and remove the bounds check. And so forth.

Hence, the rule that on encountering UB, the compiler may do anything. It is not so much that the compiler is intentionally doing anything, as that it is outputting code that assumes the UB can't happen. If the compiler chooses to implement an `if (b)` via a computed jump, the `b == 2` case may land on some completely ridiculous code to start executing. It is not that the compiler wants to land there to punish you, it's that the compiler's only responsibility is to make sure the `b == 0` and `b == 1` cases work.

What you're pointing to is unspecified values. This means that the implementation can return any value, but must actually return some coherent value. If a C++ function returning bool returns an "unspecified value", it is still only returning either 0 or 1. You can act with the resulting bool as if it is in fact a bool. There are no optimization gotchas to worry about. It's just that you don't know what it is.

For the particular case here, when you're inspecting a runtime error, the only useful thing to do with it is to call the Error() method from the error interface. The language guarantees you that it will work (i.e., you have defined behavior, that you have an object that indeed implements the error interface). It doesn't give you any particular guidance on exact error values or the resulting string. But that's no more "undefined behavior" than the result of reading from a file is "undefined".

The entire reason people care about undefined behavior is the fact that compilers can do arbitrarily-stupid(-seeming) things when it is present, which is solely because compilers want to optimize. In the case of unspecified values, there is nothing to optimize.

(Anyway, I am not a Go programmer at all. Maybe there is actually UB somewhere in safe Go. But what I have seen is not it.)