Hacker News new | ask | show | jobs
by janwas 1474 days ago
Thanks for the instruction table, hadn't seen that yet. That is indeed remarkably symmetrical! It does reinforce the "half of AVX-512" result - we sustain ipc=2 on Skylake (with 512-bit) and it looks like M1 would sustain 4 (x128 bit).

huh, that's surprising, that plot indeed looks like a core might be grabbing more than 'its share' of L2, though not all. The 'full random' curve starts creeping up after ~3MB as expected, so the situation seems to be even more complex than "use up to 8MB".

For completeness I'll also measure for 100M elements single core, though on M1 that wouldn't make a difference because as you say, a single core can drive a lot of memory bandwidth, enough that NEON becomes the bottleneck.