Hacker News new | ask | show | jobs
by blattimwind 2178 days ago
For what it's worth, Intel is still faster in most applications, simply by virtue of having a clock speed advantage that by far exceeds any IPC difference, and also by having much lower memory latencies. AMD has basically a 20-30 ns extra latency over Intel; so with good memory you can do ~45 ns on current Intels, but that will give you ~65 ns on a Ryzen. That's significant for a lot of code (e.g. pointer chasing, complex logic etc.).

On the other hand, few applications scale efficiently to more than just four cores. Yes, of course, AMD delivers more Cinebenchpoints-per-Dollar and usually more Cinebenchpoints overall, but that's not necessarily an interesting metric.

Personally I find that if I'm waiting on something to complete that the application in question tends to use only a tiny number of cores for the task at hand. Usually one.

Another significant weakness of AMD's current platform is idle power consumption.

These factors leave me with a much more nuanced impression than "Intel is ded" or "HOW IS INTEL GOING TO CATCH UP TO THIS????"; CPU reviews these days are just pure clickbait.

3 comments

The problem is a lot of tasks that people want their CPU to be fast at is exactly stuff that parallelizes almost embarrassingly well. Compiling code, video rendering, compressing files. People buying CPUs for this are not as concerned about how many cycles it takes to jump through a vtable as long as its not slow.

Meanwhile, pointing at memory latency as the flaw in Ryzen has been a popular misdirection for a while now. People warned me about it being a performance pitfall since before I bought my first Ryzen processor. In practice it doesn’t show up in even the most complexity intensive workloads as a serious issue. For example, Zen 2 performs very well on hardware emulation. This is possibly because where it takes a hit in memory latency it makes up in caching and prefetching, but honestly I don’t know and I am not sure how to measure. In any case it’s certainly favorably comparable to Intel’s best chipsets in single core workloads even if not on top. Factor in price and multicore workloads and you now have the exact reasons why people like me have been singing the praises... Intel’s single core lead may exist in some form but it is not what it once was, it is not an unconditional lead where an Intel core beats an AMD core. Not even close.

None of this means Intel’s dead of course, but IMO thats mostly because they have a lot more going on than just being the best CPU. They’ve got their dedicated GPU coming out, and plenty of ancillary technology as well. It does seem like for a company like Intel having to take a backseat in CPUs for a while will be painful; unlike AMD, this is a new position for Intel and maybe not one they will handle well.

You can get an idea of how popular different processors are in the server space by looking at the AWS EC2 spot market. Top end Xeon server processors (C5 and Z1d) typically have much lower spot discounts than AMD EPYC based processors (r5ad), although ARM c6g instances have been pushed up in price significantly over the last few months, perhaps as people switch over to them for the per-computational-unit cost savings.

Of course, this is all a factor of Amazon's supply of instances and their chosen on-demand pricing level, but the trends are certainly interesting, and show steady demand for fast Xeon's and increasing demand for ARM's. I have run some compute heavy workloads on the best AMD's I could find on AWS and the speed difference per core for my particular workload was nearly 50%, which got worse as it scaled up to bigger instances because my workload uses a lot of L3 cache. I hear about EPYC's with 256MB of L3 cache but I can't seem to find those on AWS -- only ones with 8MB of cache.

Disclosure: I work at AWS on building cloud infrastructure

C6g instances only launched on June 11. I'm not sure what information can be gleaned from the spot prices regarding Arm demand at this time.

The C5a instances powered by AMD Rome processors have 192 MiB of L3 cache per socket total (16 MiB L3 slice per compute complex, 12 CCX per socket). You can observe this from the cpuid(1) output:

   L3 cache information (0x80000006/edx):
      line size (bytes)     = 0x40 (64)
      lines per tag         = 0x1 (1)
      associativity         = 0x9 (9)
      size (in 512KB units) = 0x180 (384)
384 * 512 KiB = 192 MiB

(you can download cpuid from http://www.etallen.com/cpuid.html)

Thanks for the info -- I must have misinterpreted the spot pricing history chart for c6g. While you're here, does the AWS hypervisor have any means to dedicate a portion of the L3 cache to each virtualized core, or is it a free-for-all for all of the cache space (such that a noisy neighbor could potentially be evicting data held in your L2 cache or even L1 cache by thrashing the L3 cache)?
For instance families like C, M, and R, processor cores are dedicated to one instance, and the virtual processor is pinned 1:1 to the underlying logical processor. Therefore there is no neighbor that is able to use the L1 and L2 caches.

For L3 cache, we try to optimize for the best overall performance for the majority of the time. Smaller instance sizes share L3 cache with other instances. I wouldn't call it a "free for all" given some changes in how the cache hierarchy has been shifting over time (e.g., Skylake-SP L2 cache per core was increased, and the L3 cache is now 'non-inclusive')

I want my video games, email reader, word, youtube, IDE and general python code to run faster. None of those are parallelizing much of anything.
1. It is unlikely the CPU is a serious bottleneck in many of those circumstances. Even if it takes a measurable amount of time, that does not mean a faster CPU will make a meaningful improvement, if even measurable improvement. If you think it will, try overclocking and measuring your gmail load times.

2. Like I said, in my experience Ryzen also competes just fine in single core. It just also decimates in multicore. I’d rather have some tasks run significantly faster than have some run very slightly faster. But that is disregarding the fact that not all tasks are the same and it does in fact win some categories. These CPU architectures are more divergent than usual for lately.

3. Things you think aren’t parallel are. Video games using modern graphics APIs are in fact able to exploit multicore CPUs. Browsers absolutely exploit multicore CPUs. Your system in general will exploit multicore CPUs so during general usage when you are doing more things and have more software running, single core performance will be hurt less. And so on.

Your email reader, word, youtube and IDE isn't likely to push the limits on any modern CPU, your video game is increasingly optimized around multiple cores because modern consoles ship with multi core cpu's and they need all the performance they can out of them. Only thing that might benefit from single cpu performance is probably your general python code.
Gmail and the IDE take ten seconds to load, while youtube is destroying any CPU to watch a 4k video (or 1080p on a battery saving laptop).

Youtube is possibly the single largest root cause for users upgrading laptops over the past 10 years. They made a silent transition to 60 FPS videos last year which cut hundreds of millions of users from watching HD.

Destroying CPU in some configurations....

https://www.youtube.com/watch?v=ef1wAfrMg5I is ~10% of 1 cpu on my desktop using chrome.

OTOH, I know what your talking about, my linux machine hates youtube, but that's because even with the chromium freeworld fork with some codec acceleration its still burning CPU like crazy.

So, a big part of this isn't a hardware problem so much as a software one combined with the constant fights over who's codec is the one true choice. AKA its a youtube and !windows/android+chrome problem.

> Gmail and the IDE take ten seconds to load,

Those tasks are IO-bound, not CPU-bound.

Your concerns have no basis whatsoever.

> Youtube is possibly the single largest root cause for users upgrading laptops over the past 10 years.

No one in the whole world feels the need to upgrade to a high-end workstation because of YouTube videos.

> The problem is a lot of tasks that people want their CPU to be fast at is exactly stuff that parallelizes almost embarrassingly well. Compiling code, video rendering, compressing files.

Compiling code isn't embarrassingly parallel unless you're building some project with lots of files from scratch. Video rendering and compression also don't benefit as well as you may think:

https://www.phoronix.com/scan.php?page=article&item=3900x-39...

Meanwhile, single-threaded performance affects pretty much 100% of what you do.

In the end, I don't think there's a big difference either way.

> Meanwhile, pointing at memory latency as the flaw in Ryzen has been a popular misdirection for a while now.

How is it a misdirection? The data is accurate and memory latency scaling is a well-known issue for simulations like e.g. games (which is a huge market for high end desktop CPUs and also the market 90 % of reviews address), where you can't really explain the performance differences just by higher clocks. It's considered the main reason why much older Intel CPUs can still outperform Ryzen CPUs in games.

On the other hand, if you take something like Cinebench you can literally turn XMP off (thus using JEDEC timings and bus speed) and still get almost the same score (within, say, 2 %). That's because Cinebench is benchmarking pretty much only ALU throughput. That's obviously an important factor for performance, but just as obviously not the only one.

>Intel is still faster in most applications, simply by virtue of having a clock speed advantage that by far exceeds any IPC difference

This is already only marginally true, the difference is only about 5% depending on the application, and in some applications AMD comes out ahead anyway. Expect the remaining difference to disappear when Zen 3 releases in a few months.

>Another significant weakness of AMD's current platform is idle power consumption.

AMD seems to have caught up here almost entirely. They've done a lot of work to improve idle power consumption lately and the node advantage probably helps, too.

Hi, this is very informative. To clarify a couple points: - By memory latency, do you mean the time to access an uncached portion of RAM? - RE clock speed advantage, are you referring to the fact that AMD turbo boost doesn't hit 5GHz?
IIRC L3 is slightly slower on Zen 2, main memory as mentioned much slower.

Clock speed advantage -- Most Zen 2 CPUs don't overclock to 4.5 GHz on any core, let alone all-core. The boost numbers are reached with current firmware, but only for tiniest fractions of a second and never under any real load. Sustained single-core boost frequencies are 200-400 MHz lower than the specified boost frequency. On the other hand, Intel CPUs consistently reach their boost frequencies under load, and most CPUs can do their single-core boost as an all-core overclock under load (with much greater power consumption of course).

In practice this means that for equivalently priced parts (e.g. 3900X vs 10900K) the AMD part will have about a GHz lower clock for lightly threaded workloads, which are most workloads. With Intel settings, the Intel and AMD parts have about the same sustained clocks (3.8-4 GHz) under all-core load, but with the defaults of many motherboards the Intel part will run at 4.8-5 GHz, depending on the cooling.

>In practice this means that for equivalently priced parts (e.g. 3900X vs 10900K) the AMD part will have about a GHz lower clock for lightly threaded workloads

They're only "equivalently priced" when you're talking about MSRP. Right now the 3900X sells for $413 and is in stock, whereas the i9-10900k sells for $530 and is out of stock.