Hacker News new | ask | show | jobs
by PhantomGremlin 3864 days ago
I know that Jeff is a demigod to some people, but I interpret this article as: "As a software guy, I don't really understand why I need this fancy hardware, so this can't be important". IMO he's wrong.

The margins between working and non-working DRAM these days are extremely small. E.g. Rowhammer demonstrated that even user-space programs could readily obliterate main memory, without even trying very hard to do so.[1]

But, maybe in this case he's right. It's not like "open source Internet forum software" is anything that's mission critical. If there's an occasional garble in a character or two, will the latte-swilling hipsters even notice? :-)

Just like the original Google servers he points to. Who cares if they occasionally screwed up in reporting search results, because they didn't have ECC memory. Overall the experience was still 100x better than using something like Altavista.

[1] https://en.wikipedia.org/wiki/Row_hammer

2 comments

What Jeff is trying to say is: if ECC is so desperately needed to prevent memory errors that are supposedly happening all the time, why isn't ECC in every computer everywhere?
That question is very easily answered.

The average consumer knows that more "jigabits" are better and more "jigahertz" is better (see Intel NetBurst for how badly that can go wrong).

See a link elsewhere in this tread, someone posted a memory error presentation that talked about FIT, failures in time. But the average consumer doesn't know what that is.

Hence we get a race to the bottom. PC assemblers are willing to sell their mothers into slavery if it can save them $0.05 in build cost. ECC doesn't fit into that narrative.

BTW ECC is "in every computer" nowadays. As yet another poster mentioned, Intel CPUs use ECC internally to protect their caches.

There's at least two broad classes of error correction and detection: at-rest and in-flight.

Each storage hierarchy component (RAM, SSD, CPU caches, etc.) and interconnection (chip-to-chip, add-on card, cable to another box) needs to be looked at for risk of nondetection/data loss based on risk consequences of the intended use.

For example, billing database servers for a successful company probably should use RAID array/SAN/NAS (say RAID6 or ZFS with RAIDZ3) and Chipkill ECC memory on an enterprise-class box with decent vendor support.

CDN boxes for serving free, static content can be almost anything.

For larger shops, they have the economies of scale to ask from OEMs and ODMs to build custom boxes that are more optimized than COTS gear at Dell, HP or CDW.

When Jeff's venture takes off, they might explore gear customized for running Ruby and/or partnering with 37signals and the like to have OEMs/ODMs folks develop better performing gear and open source it like Facebook has.

Cost is king in a commodity market where IBM, et. al. left for more profitable waters.

Dell was pretty good at shaving pennies and providing WalMart-ized desktops and servers.

I think the offerings need to be optimized and reduce and cut features to just what's necessary based on actual, intended uses rather than guessing or throwing every possible feature into a retail desktop or offering a blizzard of different, poorly-explained SKUs (what's the diff btwn A78Z-VX and A78C-VX+?)

Related, see also: http://cr.yp.to/hardware/ecc.html

"Non-parity" RAM probably started becoming common around the 1993-1995 period when DRAM demand was increasing and prices was not falling much. For example, 4Mbit DRAM was costing more than $10 per chip during this period. Nowadays Intel uses it for market segmentation.
Because the consequence of failure in most desktop scenarios is low, and doesn't justify the cost for mainstream use cases.

It does matter for stuff like big databases and ERP.

Just to be clear SECDED ECC doesn't protect you against row hammer and similar memory disturbance attacks.

DDR4 implemented some mitigation against such attacks as well as some additional soft ECC mechanisms but as these types of attacks are fairly new it's not quite yet clear as how effective they are.