| HN Mirror

Y	Hacker News new \| ask \| show \| jobs

by pizlonator 2286 days ago

First of all, let’s be clear: for a lot of algorithms adding call overhead around them is free because the cpu predicts the call, predicts the return, and uses register renaming to handle arguments and return.

But that’s not the whole story. Malloc perf is about what happens on free and what happens when the program accesses that memory that malloc gave it.

When you factor that all in, it doesn’t matter how many instructions the malloc has. It matters whether those instructions form a bad dependency chain, if they miss cache, whether the memory we return is in the “best” place, and how much work happens in free (using the same metrics - dependency chain length and misses, not total number of instructions or whether there’s a call).

1 comments

jlebar 2286 days ago

> First of all, let’s be clear: for a lot of algorithms adding call overhead around them is free because the cpu predicts the call, predicts the return, and uses register renaming to handle arguments and return.

This is only thinking at the hardware layer. There are also effects at the software layer.

Function calls not known to LTO prevent all sorts of compiler optimizations as well. In addition, as Jeff says on this thread, not being able to inline free means you get no benefit from sized-delete, which is a substantial improvement on some workloads. (I'd cite the exact number, but I'm not sure Google ever released it.)

Source: I literally worked on this.

link

ahh 2286 days ago

LTO I will grant you. I don't think you need to meaningfully inline the body of delete to get sized-delete to work effectively. You do want to statically link your sized operator delete body though, obviously. As of the last time I tried this, trying to inline the actual body was difficult to make effective, though.

(parent: read username :))

link

jlebar 2286 days ago

Hi, Andrew. :)

Agree, I shouldn't have been loose wrt the distinction between inlining and LTO. I think I also ran that experiment with inlining sized-deletes and came to the same (surprising, to me) conclusion.

link

pizlonator 2286 days ago

Let’s be clear: even without LTO, the compiler knows what malloc and free do at a semantic level.

I buy the sized delete win, I’ve seen that too, in some niche cases. But that’s a far cry from the claim that not inlining malloc is bananas. If that’s all you got then this sounds like a stunning retreat from the thesis of Jeff’s post. You’re just saying that you’ve got a free that benefits from size specialization, which has nothing to do with whether malloc got inlined.

link

jlebar 2286 days ago

> even without LTO, the compiler knows what malloc and free do at a semantic level.

  int *a = ...;
  int b = *a;
  malloc(42);
  b = *a;
  printf(b);

If malloc is an opaque function, can the compiler nonetheless eliminate the second `b = *a` load as dead because it "knows what malloc does"? Certainly not, right?

link

jepler 2286 days ago

https://godbolt.org/z/z-92qR

Interestingly, the compiler emits no call to malloc, even. What!?

link

steerablesafe 2285 days ago

With -fno-builtin. It's still "cheating" as b is local.

https://godbolt.org/z/U5VGxt

Without cheating:

https://godbolt.org/z/rzcXg2

It doesn't optimize the two stores to b away.

link

gpderetta 2286 days ago

The compiler knows the semantics of malloc. Malloc as it appear in the source code doesn't even need to map 1:1 to calls to the library function, the compiler can remove and coalesce calls at will.

link

ckennelly 2285 days ago

You can still get a benefit from sized delete, even without inlining.

link