Hacker News new | ask | show | jobs
by DrPizza 5318 days ago
> This article is impressively bad.

Cheers!

> While the 8-module chip does share a few things (mainly a vector processing unit, that becomes two when doing the 128-bit SSE operations)

A few things? No, it shares a lot of things. The entire floating point and SIMD unit. The entire front-end. The branch predictor is, I believe, a weird hybrid of shared and non-shared. The I-cache, and the L2 cache, also both shared.

The front-end is particularly troublesome. The entire decoder can either service one thread or the other. If both threads need instructions, the best it can do is round robin between them. This averages to allow just two instructions per cycle: less decode bandwidth than K10.

Likewise the integer units: there are fewer ALUs and AGUs per thread than in K10. Likewise the floating point unit. There's lots of sharing, and even the private, non-shared parts are resource-starved.

> But, they'll have sse contention if they schedule more than 8 256-bit vector operations (sadly intel won't bring this instruction set to market for a bit).

SSE contention will occur if a thread can issue more than two SSE operations per cycle, or one AVX operation per cycle.

> Bulldozer is pretty cool, but sadly the tech press decides to shit on the underdog in a market that multiple companies have successfully sued the monopolist for anti-competitive behavior. :(

I don't care about "the underdog" or which is "cool" or which multi-billion dollar corporation you might prefer. I care about which works better. It ain't Bulldozer.

3 comments

Benchmarks at http://www.phoronix.com/scan.php?page=article&item=amd_f... appear to paint a picture of a much more balanced performance profile for bulldozer chips. It does well in threaded applications and where code is recompiled for it.

I cringed a bit when I saw this on arstechnica - the linkbaitey headline, the image of a burning bulldozer, the lack of any benchmarks that you ran yourself and the fact that data is presented in a lopsided fashion. Here are a few examples -

a) If you look at the actual prices for the Xeon system and the AMD system you can see that the price of the system is entirely dominated by the cost of the SSD drive. Of the ~1.5 Million in before discount price nearly 1.2 Million is for the SSD in the AMD system. While in the Xeon system 485k of the roughly 740k price is the SSD. Penalising AMD for that seems unfair. Also it remains unclear what the SSD in the AMD at double the cost of the Xeon SSD does for performance. b) In the SPEC JBB2005 section where the bulldozer 6200 scores 1.25 million bops, the 6100 gets 0.981 million, and the Xeon has 0.975 million you explain away the high performance saying that this exists only because of a higher number of cores. c) For the SAP section - "the 6200 scores 31,720 SAPS, the 6100 scores 24,020, and the Xeon gets 28,480. The 6200 system, with 33 percent more processors than the 6100 system, gets 32 percent more performance." Heres a test that clearly contradicts your Bulldozer is absymal narrative. d) In the end you write - "AMD is boasting that Opteron 6200 is the "first and only" 16-core x86 processor on the market. Not only is this not really true (equating threads and cores is playing fast and loose with the truth), it just doesn't matter. " - except in the SPEC JBB2005 test where you yourself said that "But these results are still cause for some concern. The 6200 part has 33 percent more cores than the 6100 part, as well as a minor clock speed advantage. Its performance in this CPU-stressing benchmark is only 27 percent greater than that of the 6100. " e) Next time please run some benchmarks of your own.

The Phoronix benchmarks, like most others, suggest that the only area where Bulldozer appears at all competent is HPC. To describe this as niche is an understatement.

a) I agree it remains unclear how much difference the SSD makes. That's why I don't think it's a useful demonstration of Bulldozer's performance _even though AMD is citing it as such_. b) Yes, I do. That 1/3 more cores gives 1/3 more performance in a test that scales almost perfectly means that the per-core performance has stood still. A 32 nm K10.5 chip with 1/3 more cores would perform just as well, cost less to build, use less power, and eliminate the performance regressions. So what is the point of Bulldozer? c) No, it reinforces the "Bulldozer performs no better than a scaled up K10.5 system would and hence is pointless" narrative. d) @_@ e) No. I don't have a half million dollars of equipment just lying around so that I can run TPC-C (etc.) myself.

My reading of the phoronix article suggests that Bulldozer does fairly well on the following tests. a) ffmpeg encoding b) parallel io c) x264 encoding d) compression e) mp3 encoding f) c-ray rendering g) smallpt

I will concede that I know virtually nothing of which workloads are representative of what percentage of the market.

I don't think most server systems are doing much in the way of MP3 or H.264 encoding.

Rendering is more or less equivalent to HPC. Different markets, but similar problem sets (lots of computation, minimal communication or dependencies between threads).

None of those are particularly relevant to typical server workloads; servers are doing things like querying databases, spitting out Web pages, running Java VMs, running virtualization software, that kind of thing.

Thanks for the reply but your piece wasn't balanced and well below the quality standards I used to hold for your site (always re-balance expectations!). You didn't talk about power consumption or anything interesting about the platform. We can get press releases from intel.
What is there that is "interesting" about the platform?

Power consumption was mentioned at a number of points in the article. It's just there's not a whole lot to say about it--it's not exactly a strength of the architecture.

Are you the author?
Yes.