Hacker News new | ask | show | jobs
by jsn 5347 days ago
From http://www.theregister.co.uk/2011/11/01/hp_redstone_calxeda_... :

The sales pitch for the Redstone systems, says Santeler, is that a half rack of Redstone machines and their external switches implementing 1,600 server nodes has 41 cables, burns 9.9 kilowatts, and costs $1.2m.

A more traditional x86-based cluster doing the same amount of work would only require 400 two-socket Xeon servers, but it would take up 10 racks of space, have 1,600 cables, burn 91 kilowatts, and cost $3.3m.

Hmm, let's see. It's about 7-8 grands per Xeon server, something like HP Proliant DL360R07 (2 x 6-core Xeons at 2.66GHz). It's 3 times as many cores as Redstone, clocked at 2.66 times greater frequency each, and doing more instructions per clock tick, too. And that's without hyperthreading.

Am I missing something big, or is Redstone solution neither cost-effective nor energy-effective?

3 comments

You assume the application is compute limited and that the extra performance on the Xeon translates into extra performance on a given application. That's probably not a good assumption for this kind of workload.
Why, for embarrassingly parallel workloads (like the ones they mention) it's a totally reasonable assumption. And for something not so parallel the gazillion of ARM nodes is all but useless.
The article mentions Hadoop, big data crunching, web serving and web caching. They may or may not be embarrassingly parallel, but that doesn't mean any of them are typically compute bound.

Look, today's multichip, multicore servers tend to be unbalanced for a lot of workloads. Their massive compute performance often burns power waiting for main memory, disk or network.

You're going to be I/O bound (network or disk), memory bound, or compute bound. It's hard to imagine the Redstone systems besting Xeon based servers in any of the three.
It depends entirely on where your bottlenecks are. If the bottleneck is entirely within your node, then this isn't going to be compelling. If you're doing something that's very light on the resources within your node (serving static content, etc) and your bottleneck is some other system somewhere else, then these sorts of machines could be compelling purely from a space/power POV.
If your nodes are not bound on some local resource, you can as well just run them in virtualization containers on Xeon. The setup will be even more flexible than with (less powerful) ARMs.
But not nearly as space/power-efficient.
If your workload runs on one or two Xeon servers, it probably isn't worth considering something like this. If your workload runs on racks of Xeon servers, it might be.

Then the question is, which hardware delivers the right balance of CPU, memory and IO bandwidth for the lowest capital and operating costs.

Also for what it is worth, each card has 60Gbps of general IO bandwidth, and another 48Gbps of SATA disk bandwidth.

Even if you triple the number of Redstone machines, you'll still use just ~30% of the energy and 7.5% of the cabling.

And each 4 ARM cores have their own memory channels and I/O ports, vs every 6-12 on the Xeon [corrected] (point being that CPU speed is not the only variable here).

The Calxeda chip is quad-core, so there's still sharing.
My bad. The tray picture shows 36 boards, I didn't pay much attention and thought those were 72 single-core nodes.
By my calculations the Redstone config has 6400 cores and the traditional one has 4800 cores. But discussing such vague claims is pretty pointless anyway.
The original Calxeda reference design from last year was a 2U rack-mounted chassis that crammed 120 processors (and hence server nodes)

also:

HP can cram three rows of these [4 CPU --jsn] ARM boards, with six per row, for a total of 72 server nodes

From that I conclude that in their calculations 1 CPU == 1 server.

Each CPU is a separate node in this configuration--separate DRAM, IO, etc.