I'm surprised how fast a C function call is. I would have thought that creating a stack frame would be slower than that, (and significantly slower than a floating point division), but I guess not.
That's because nothing is created. A function call is just putting some registers on the stack and jumping somewhere else. Stack growth etc. is typically handled implicitly by the OS (if a write on the stack pagefaults more stack memory is allocated).
Contrary to eg. an interpreter where a stack frame would usually be a real, dedicated object that is allocated when a frame is needed.
sometimes not even that. it might just update a single register and write nothing to cache or memory... and that cost might be hidden by out-of-order execution and the parallelism of different units on the CPU, making it effectively zero-cost.
Contrary to eg. an interpreter where a stack frame would usually be a real, dedicated object that is allocated when a frame is needed.