Hacker News new | ask | show | jobs
by hDeraj 3580 days ago
When calling a simple function like this within a large loop, would it make a noticeable difference in speed to inline the computation vs. having a function call? If so, what's the best practice for inlining a computation like this? I imagine a macro would be the simplest solution but I'm interested to hear any other techniques that are used
3 comments

> When calling a simple function like this within a large loop, would it make a noticeable difference in speed to inline the computation vs. having a function call?

Maybe. Inlining small functions can reduce cache load, and it means no call/ret instructions and no overhead of argument passing. Moreover inlining allows futher optimizations which can't be done without breaking function boundaries. It may be noticeable. And may be not. Depends on loop.

> what's the best practice for inlining a computation like this?

There are a lot of examples can be seen in linux kernel. Just random example from include/linux/list.h:

  static inline void list_replace(struct list_head *old,
				struct list_head *new)
  {
	new->next = old->next;
	new->next->prev = new;
	new->prev = old->prev;
	new->prev->next = new;
  }
Keyword 'static' allows compiler to make no callable (not inlined) copy of function at all, and also it allows to define such a functions in header files. Compiler can't inline function call if it has no function definition at compile time. Declaration is not enough for inlining. Therefore such a functions likely to be defined in headers, and 'static' becomes necessary. When defined in *.c file 'static' can be omitted, but probably better not to.

With C++ such a functions will be a methods in most cases, and (if so) "static" would be unneeded and wrong.

And you'll need to turn on optimizations when compiling. Compiler is not inlining when not optimizing.

A call and a return is already two instructions, and this simple function is essentially also 2 (or 1 if you use a conditional move as illustrated). Passing the arguments and making the call alone takes more instructions than the function body itself. It'll be both smaller and faster, so no tradeoffs there. To me, this is clearly in the "yes, definitely inline it" category.
It can sometimes make a difference, but usually the compiler's optimizer does a good job of deciding whether or not a function should be inlined.

If you want you can nudge the compiler in the direction you want via the "inline" keyword, although the compiler won't always take this suggestion to heart. MSVC has "__forceinline" but it too will not always comply.

Before the "inline" keyword, macros were the standard way to do this, IIRC

There's something funny about a compiler being able to ignore something called "force inline"
Sometimes it's not possible to inline functions, for example recursive calls