| HN Mirror

Y	Hacker News new \| ask \| show \| jobs

by endorphone 2385 days ago

"So AMD runs at 2/3 the IPC of an old Intel processor. That is quite poor!"

That is most certainly an overreach. An extraordinary overreach. Worse, it's absurdly using an AVX2 codebase, optimized for Westmere, as the baseline for "IPC" testing? The premise itself borders of gross negligence.

IPC as a generalized concept is a broad, general purpose set of instructions, not an absurdly narrow test.

Saying "Intel is faster at AVX512" is going to surprise exactly no one, and also happens to be irrelevant for the overwhelming majority of users and uses.

The microbenchmarking thing has gone on for years, and at this point anyone who has paid any attention is rightly cautious when stomping their feet and making declarations, because usually they're just pouring noise into the mix. Lazily running a couple of tiny tests is not the rigour to avoid deserved criticism.

2 comments

BeeOnRope 2385 days ago

I'm not sure if you were implying it or just using it as example of another type of unhelpful claim, but this test does not involve AVX-512.

I agree using Westmere isn't necessarily the best approach, but there is no difference in this case with either -march=native or -march=znver1.

The loop is small and simple, with only 9 instructions and compiles more or less the same regardless of march setting (I observed some basically no-op changes such as a mov and blsr swapping places). Here's the assembly (for the second test, with the bigger IPC gap):

    top:
    tzcnt  r8,rcx
    add    r8d,edx
    mov    DWORD PTR [rdi+rax*4],r8d
    mov    eax,DWORD PTR [rsi]
    inc    eax
    blsr   rcx,rcx
    mov    DWORD PTR [rsi],eax
    jne    .top

link

endorphone 2384 days ago

"I'm not sure if you were implying it or just using it as example of another type of unhelpful claim, but this test does not involve AVX-512."

Even worse! Is this a defense, because it's remarkably unhelpful as one.

The blog post was clearly a cry for attention for some project -- let's just use some clickbait IPC claims to gain it -- and continually alluded to a whole project -- an extreme niche project that still wouldn't have any relevance. But instead it's a meaningless, completely misrepresentative micro-loop.

link

BeeOnRope 2384 days ago

My read is different than yours.

I think Daniel uses those examples because they are actual examples from projects that he is or has been working on, and he's familiar with them and actually cares about them, and because it's at least a notch more realistic than something totally synthetic.

It seems like a very roundabout thing to use as a cry for attention for SIMDjson (the project I assume you are talking about), and I don't believe that's the purpose. I see no problem in linking the project.

Picking two random benchmarks and trying to extract any kind of more general IPC claim is not on solid ground, but I'm pretty sure Daniel will say he's not doing that: he's only sharing these two specific results. That's a style that reoccurs across several entries in that blog, however, so if it triggers (as it has me on occasion) you might want to look elsewhere.

link

Tempest1981 2385 days ago

Doesn't that sentence refer only to the table above, measuring "bitset decoding" with a basic decoder, comparing 1.4 to 2.1 IPC?

It would help if the blog post had some headings to separate the benchmarks and summary.

link

BeeOnRope 2385 days ago

A plain reading indicates that yes, he's only referring to the last benchmark, which showed the 2/3 disparity.

link

endorphone 2384 days ago

A plain reading indicates that such is irrelevant, because these are the two tiny cases that he selectively chose to demonstrate the "IPC gap" of AMD. If some AMD booster posted hand-selected micro-benchmarks that gave AMD a lead, and boasted with exclamations and pejoratives how terrible the alternative is, we would rightly question it. This deserves no more.

And to the other defense of "Well there are AMD people claiming the same in reverse, so that legitimizes this", I've seen exactly zero of those posts on here. None. They would be laughed off the site.

What we do have is that traditionally at a given frequency, per core AMD has long trailed on major benchmarks of significant, user-realistic loads. This is the the first generation in a long time where it actually doesn't, and where you don't need additional cores to make up the gap.

link

BeeOnRope 2384 days ago

I feel like you are intentionally being thick in order to get mad at me.

I am only talking specifically about the 2/3 claim at the bottom of the article, which for the avoidance of doubt, is simply a summary of the final measurement made in the article, i.e., the result of dividing 1.4 by 2.1. I know this because of its positioning in the article, because the numbers line up, because a different % IPC is given for the earlier measurement, and because an earlier version of post, with different results for the last experiment (with IPC of 2.8 and 1.4), showed a different ratio (50%).

How you are somehow interpreting the small clarification of the one line which was being discussion as wide-ranging defense of the article, I'm not sure. My broader thoughts are available here [1] and the comments on the article.

---

[1] https://news.ycombinator.com/item?id=21724780

link