Hacker News new | ask | show | jobs
by mstromb 4227 days ago
Why would linking order affect runtime performance? Something to do with the interaction between offsets and cache, maybe?

Would it be possible to determine ahead of time what order would maximize performance, or would that require profiling?

3 comments

I'd speculate that if you're unlucky about link order, two hot cache lines may get mapped to the same slot in an N-way associative cache -- whereas if you're lucky, they end up going to different slots and don't continuously evict each other.

With regards to alignment... do linkers typically pack objects so tightly that the start of each object isn't aligned on a cache line boundary? AFAIK they're typically 32, 64, or 128 bytes.

Cacheline boundary?

Probably, because caches line sizes are an implementation detail, not part of the architectural specification.

Linking order will affect code and static data order and cache alignment in turn. Similarly, environment variables are pushed on the stack by the kernel. Their size will affect alignment of application data on the stack.

Maybe there are other causes as well.

> Would it be possible to determine ahead of time what order would maximize performance, or would that require profiling?

I think at the very least, you'd need profiling to determine the hot code path, and that can change depending on input...

Guessing, locality of reference might play a role.
That, the branch predictors, and caching behaviors, n-way, alignment, etc.

It probably explains why different runs vary so widely, I always thought it was other things going on in the OS, never really thought about the caches, etc.

> That, the branch predictors, and caching behaviors, n-way, alignment, etc.

Those all fall under locality of reference, btw. But yeah cache and branch prediction play a huge role in the list.

One thing that stung me in the past was OS scheduling.