| > Is the addition done by itself, so it costs 1 clock cycle? Is it merged into some complex operation so the net cost is less than 1 cycle? Is it completely optimized away at compile time, so it's infinitely faster? Those are generic instruction selection/optimization questions, which are always gonna be *additional* complexity to any and all operations everywhere. So there's still benefit in cutting down the complexity elsewhere. > Is the addition by itself? Or are there store and load instructions that can stall for way more than 1000 cycles? ..those are questions about the loads & stores, not addition. On embedded, afaik loads & stores will be significantly closer in latency to arith too. > At the same time, every single person that is good at micro-optimizations look at the compiled binary as a first step, because C is a high-level language that has little relation to the code the compiler actually creates. Yes, but being able to have good intuition is still quite important, because one can think & read code much faster than compile & read assembly. > the people repeating that execution time is well known didn't actually practice micro-optimizations based on that fact. The question of operator overloading is mostly about reading code, not writing it. And it doesn't have to be micro-optimization either, any level of optimization will be affected by a call happening where you don't expect one (probably most importantly the kind where you scan over a piece of code to figure out if it does anything suspiciously bad (i.e. O(n^2) or excessive allocations or whatever thing may be expensive in the codebase in question) but it isn't worth the effort diving into assembly or figuring out how to get representative data for profiling the specific thing). Or you could just be exploring a new codebase and wanting to track down where something happens, where it'd be beneficial to have to just scan through function calls and not operators. |