|
Suppose you have a function `a` that calls two other functions in succession, `b` and `c`. Further suppose `b` and `c` each have one large stack variable (e.g. an array). Ideally we should be able to reuse the same stack memory for both variables, since they aren't in use at the same time. There are three cases: neither `b` or `c` is inlined; both are inlined; or only one is inlined. If neither is inlined, the memory is always reused. If both are inlined, the memory is usually reused but not always. If only one is inlined, the memory is never reused. Therefore, marking functions no-inline can indeed reduce stack usage, but it depends on the situation. Details: Case 1: Neither `b` or `c` is inlined. Then `b` will push its stack frame and pop it when it's done, then `c` will do the same with its stack frame, reusing the same memory. Case 2: Both `b` and `c` are inlined. Then both of their variables will be part of `a`'s stack frame. A naive compiler would put each variable in a separate location in the stack frame, so the size of the stack frame would be at least the sum of `b` and `c`'s variables, wasting stack space. However, most compilers can determine that the variables' lifetimes don't overlap and reuse the same part of the stack frame for both. (LLVM calls this "stack coloring", for reference.) Most compilers, but not all. In a simple test on gcc.godbolt.org, GCC, MSVC, and Clang all normally perform this optimization at all optimization levels [1]. But ICC (Intel C compiler) fails to perform it, allocating space for both variables even at -O3. And there are many more obscure C compilers (not that commonly used these days, but they exist), some of which presumably have the same problem. Case 3: One of `b` and `c` is inlined but the other isn't. Suppose `b` is the one inlined. `b`'s variable will be incorporated into `a`'s stack frame, but when `a` then calls `c`, `c` will push its stack frame on top of `a`'s. In theory, `a` could dynamically reduce the size of its stack frame before calling `c`, but as far as I know, no major compilers do this, regardless of the optimization level. Therefore, the memory cannot be reused. [1] Test case: https://gcc.godbolt.org/z/nY4ddz7q1 Note: Clang actually doesn't perform the optimization at -O0, but at -O0 functions are never inlined unless forced to be with always_inline, which should be used sparingly. So it's not a concern in typical situations. MSVC, for its part, doesn't inline functions at /O0 even if they are marked __forceinline. |