Hacker News new | ask | show | jobs
by marcosdumay 1071 days ago
> These statements are not true in languages which support operator overloading

I guess I will never understand the C and Java developers incredible fear of operator overloading.

Do you have the same reaction to user-defined functions? Because they are exactly the same thing. Is it because of the bad type system that won't let you know what operator you are using?

2 comments

I guess I will never understand the C and Java developers incredible fear of operator overloading.

The answer is in the sentences right before the one you quoted:

Relatedly, it's more explicit than almost any other language. If a line of code doesn't look like a function call, it's not calling anything. There is no hidden control flow.

Consider the use-cases for C: operating system kernels, hard real-time software, low-level libraries, databases, embedded software. What is a common desire among these? Predictable low-latency and high throughput.

It's much easier to achieve these features if your language does not allow "magic." Implicit allocations, RAII, exceptions, overloaded operators; these are all examples of features which allow a library-writer to inject hidden control flow into your code. This can make it very difficult to analyze why code runs slowly or with unexpected random pauses, not to mention making it much harder to step through in a debugger.

The control flow is the same; you evaluate the parameters, and then evaluate the operator. Just like any other function call, there's nothing implicit or hidden. The only difference is that you can't create other operators with the same name for different types.

And whether something is called or run inline is always decided by the compiler. Modern C doesn't promise you any relation between the way you break down your functions on your code and the actual function calls on the assembly it generates.

So, I keep seeing people complaining about overloading; always with the same reasons; that are patently not valid unless there's some implicit assumption they keep not stating. What is that assumption that breaks the equivalence between user-defined functions and operators?

Just like any other function call, there's nothing implicit or hidden.

The implicit part is the question of whether an operator is built-in or overloaded. In C, every operator is built-in, so you can look at a block of code and see that there are NO function calls in it. With something like C++, you must treat every operator like a function call.

With C, if I write:

    a += b;
I can be VERY confident that this line of code will execute in constant time. With C++ (or other operator-overloaded language), I cannot. I need to know what the types of a and b are, and I need to go look up the += operator to see what it does for these types (and this is not one universal place, it's specific to the type).

Furthermore, this may be the last line within a particular scope. With C I know that nothing else will happen, and that the control flow depends only on the surrounding scope. With C++, I don't know this! There may have been many objects created within this scope and now their destructors are firing and potentially very large trees of objects are being cleaned up and deallocated, and even slow IO operations running.

> With C++ (or other operator-overloaded language), I cannot

All programming requires people to follow reasonable conventions. In C++ if you make a dereference operator with non-constant time, or an equality operator which doesn't follow equality semantics, the programmer messed up. It's like giving a function a misleading name, like `doThis()` and it doesn't.

Note that Java is filled with these kinds of conventions, such as overloading `equals`. How can you be certain it actually obeys equality semantics? You have to trust the programmer.

If I see `x+y` in C, I know 100% that it'll be ~0-1 instructions, O(1), and will have the lowest latency & highest throughput that a thing can have, i.e. basically completely ignorable for figuring out the perf of a piece of code, or determining what complex things it may do (additionally, it'll hint that the operands are pointers or numbers). For `f(x,y)`, none of those may hold. With operator overloading, f(x,y) and x+y have the exact same amount of instantly tellable facts, i.e. none. x+y becomes just another way to do an arbitrary thing.

In C, if I'm searching for how a certain thing may be called from a given function, I only have to look for /\w\(/ and don't have to ever think about anything else.

Honestly, operator overloading isn't really that bad (especially if an IDE can highlight which ones are), but it's still a thing that can affect how one has to go about reading code that might not even use it.

However as a novice I found it unintuitive that on an embedded platform without hardware floats x/y will compile but compiles to a polyfill with quite a few instructions.
That’s the only caveat. With operator overloading, the scope for what happens on a given line of code expands dramatically. Now your entire dependency graph is part of the search space. Heck, the operator might not even terminate at all!
> That’s the only caveat.

a = b + c;

Is the addition done by itself, so it costs 1 clock cycle? Is it merged into some complex operation so the net cost is less than 1 cycle? Is it completely optimized away at compile time, so it's infinitely faster?

Does the addition trigger some trap, that will run some distant code?

Is the addition by itself? Or are there store and load instructions that can stall for way more than 1000 cycles?

I doubt you can answer any of those questions. All you and everybody else keep repeating is you can micro-optimize C better because that line, that you expect to take something from 0 to 2000 cycles is certain to not do a call and return pair, that takes less than 10 cycles. All while the alternative is almost certain to do the exact same, but you would need to check it up.

Honestly, that argument doesn't make sense; and I keep understanding it as people complaining that they want to micro-optimize a program, but don't know if it's operating on native integers or 10-dimensional hypermatrices.

At the same time, every single person that is good at micro-optimizations look at the compiled binary as a first step, because C is a high-level language that has little relation to the code the compiler actually creates.

For a long time I did just shrug it away and file those complains as "those people don't even know the language they are using". But its universality forces me to consider that there is a reason for complaining, and maybe it's worthwhile to understand. Now, given that this is all the answer I get, it seems quite likely that even the ones complaining don't consciously know what the problem is... But one thing is certain here, the people repeating that execution time is well known didn't actually practice micro-optimizations based on that fact.

Right, that's definitely quite a strong point against the C operator-function separation. There can be a good argument made for just not providing unavailable operations as operators. But, still, x/y won't touch any of your memory (assuming a non-broken stdlib), so you're still free to skip over it while scanning for a use-after-free or something.
User defined functions require a function call pre- and post- amble to be added to the machine instructions that execute the function behavior. Typically this consists of growing the stack, adjusting required pointers at the top and then undoing that at the end. In C the operators defined by the language implementation do not involve any adjustments to the stack frame and do not invoke a ‘call’ or jump instruction in the assembly. Once operator overloading is possible this difference immediately becomes blurred.
I would say that C macros have inspired the development of concoctions of far greater magical qualities than, say, RAII. C programmers are not immune to violating the principle of least astonishment.
In terms of C functions _typically_ being globally defined, mostly unique identifiers are a good thing in terms of code readability.

Of course, C functions can be passed as variables. Or in a wider scope they might be inline, macros, or ifdef'd to different functions. But those cases are _typically_ recognized as undesirable and avoided.

Java's a bit of a different story, which I can't figure out a good way to explain. It's hard to explain problems in large code bases, as a quick example rarely suffices. I've seen more than one bug caused because foo.bar(qux) called a different method of bar than the original programmer intended (both because foo's bar was overwritten and qux was a different type than expected).

Don't get me wrong, I would use operator overloading in a heartbeat if I was writing code for a math-y CS coding assignment. It's fine for code that will have a lifepsan measured in weeks / months with probably only 2 or 3 people ever looking at it.

Saying what you mean, as clearly and directly as possible, has it's perks in certain applications (large code bases, life critical code bases, code bases that will last for decades with dozens of programmers). Otherwise stated, cases where code is going to be read many times more than written.

To answer your question more directly: User definable functions aren't a problem. Re-definable functions are!