Hacker News new | ask | show | jobs
by vlmutolo 1421 days ago
> I still use Cachegrind, Callgrind, and DHAT all the time. I’m amazed that I’m still using Cachegrind today, given that it has hardly changed in twenty years. (I only use it for instruction counts, though. I wouldn’t trust the icache/dcache results at all given that they come from a best-guess simulation of an AMD Athlon circa 2002.)

I'm pretty sure I've seen people using the icache/dcache miss counts from valgrind for profiling. I wonder how unreliable these numbers are.

1 comments

https://sqlite.org/cpu.html#microopt -

Cachegrind is used to measure performance because it gives answers that are repeatable to 7 or more significant digits. In comparison, actual (wall-clock) run times are scarcely repeatable beyond one significant digit [...] The high repeatability of cachegrind allows the SQLite developers to implement and measure "microoptimizations".

There's a bunch of ways for caches to behave differently but have they changed much over the past 20 years? i.e. is the difference between [2022 AMD cache, 2002 AMD cache] significantly greater than the difference between [2002 PowerPC G4 cache, 2002 AMD cache, 2002 Intel cache] ?

I would guess yes, just based on the L1/L2 (later L3) use and sizing between all those systems. 2002 vs 2022 is K8 vs 5800X3D for AMD, so you're looking at having 1 core and 64+64KB of L1 cache, 512KB of L2 cache[1] vs 8 cores (+ht) and 32+32KB L1 per core, 512KB L2 per core, 96MB L3.

Just managing the cache access between L2 and L3 I think would be additional consideration, but then you have to consider the actual architectural differences and on server chips locality will matter quite a bit.

[1]: https://en.wikipedia.org/wiki/Athlon_64

I don't know how sophisticates the streaming/prefetch/access pattern prediction the 2002 cpus did was.

I'm speculating, but if that's not modeled, cachegrind may pessimize some less simple predictable patterns and report a lot of expected misses when the cpu would have been able to prefetch it

Agreed, I suspect it'd be most accurate to say the SQLite folks are minimizing their working set.

I picked a couple of random performance commits out of their code repo, and they look like they might keep 1 or 2 lines out of i-cache: https://sqlite.org/src/info/f48bd8f85d86fd93 https://sqlite.org/src/info/390717e68800af9b