Hacker News new | ask | show | jobs
by bashtoni 5248 days ago
Apples / Oranges.

Quite apart from the ability to buy by the hour, the Hetzner offering really isn't suitable for processing any data you care about - it's not using ECC RAM, and the processor used doesn't support it.

5 comments

Why is this being downvoted? The processor used (i-7 2600) does not support ECC RAM, and this platform does not include it. A server platform without ECC RAM is a pretty big disadvantage for several reasons, to most compelling being the high real-world rate of memory errors in real systems. Memory errors in non-ECC systems can be extremely difficult and expensive to track down, and can cause data corruption or loss if the machine is used in the data pipeline.

In one study (http://www.cs.toronto.edu/%7Ebianca/papers/sigmetrics09.pdf), strong correlations were found between age, usage and error rate. Before using a server without ECC RAM, it's worth doing a quick estimate of the amount of data you intend to move around, and whether it's worth it to you (and, more importantly, your customers) to save a little money going with a non-ECC platform.

What would be the downside to using non ECC RAM in a heavily loaded server? Would it cause processes to crash?
If you are lucky, processes would crash. More likely there would be silent bits of data corruption.
Gotcha, so it would just depend on what bits in RAM were corrupted (data versus process instructions)?

I am curious about a couple of specific use cases that I can think of where this might affect me (and, most likely, would be common points of failure for others):

1. data in the MySQL tables (stored in memory) is corrupted. Would mysql crash? Indicate table corruption, and I could just reload it from disk? Write corrupted data to disk, and permanently trash my data collection?

2. large process that's running data analysis (say a big python process with tons of data in RAM). Would one of my variables (say an int with value 4) turn into another number? Would it become unreadable?

I appreciate the effort to explain this. I know, in theory, why ECC RAM is useful, but I have difficulty visualizing real world scenarios.

If it was a C int then it would just change to another value. If it was a Python int then two things can happen: either the bit flip was in the value which causes the value to change, OR the bit flip was in the tag bits which causes Python to interpret the data as something else than an int. The latter would most likely cause your program to crash.

With MySQL any of those things you can happen. If you're lucky then only the cache is corrupted and then you can just reload from disk. If you're unlucky then the data got corrupted on its way to disk and the wrong data will be written to disk. If you are astronomically unlucky then the in memory machine code of MySQL got changed in such a way that it starts overwriting your entire disk with garbage. You should probably be more afraid of meteorites though. And of bugs in either your own or others' code.

ECC RAM reduces the probability of such a bit flip happening. That doesn't mean that they are eliminated entirely. So you have to do these two things in any case:

1. Bit flips can cause processes to misbehave/crash. So you want to have a way to detect and restart misbehaving/crashed processes.

2. Even with ECC RAM you want to do your own error correction for critical data (say a bank transaction log).

Here is an interesting paper that discusses the prevalence of DRAM errors and the effectiveness of ECC RAM:

DRAM Errors in the Wild: A Large-Scale Field Study -- http://www.cs.toronto.edu/~bianca/papers/sigmetrics09.pdf

It would be interesting if somebody did an experiment where they artificially flipped bits of various software's memory to see what happens. I'd expect that in many cases it doesn't do any harm at all.

I suggest looking into studies of radiation effects upon computer systems. They do a lot of bit-flipping. I was privy to results from a confidential study once, and as one might expect, enough bit flips cause big problems (the study went into more details than that, of course).
The answer to most of your questions is 'maybe', unfortunately. Reasoning about the things that could happen when memory errors occur is very difficult, because they occur outside the mental model of computation that most programmers (and systems administrators) use.

Let's use MySQL as an example. A bit flip in the memory which holds the code may cause it to crash. A bit flip in the 'metadata' could cause the table to become corrupted, potentially recoverably. A bit flip in the data itself could turn 'Travis' into 'Trbvis', which might go undetected depending on where it happened and which storage engine you are using.

The use of memory for OS page caching (less so in databases, which often use O_DIRECT and more so in other programs) means that arbitrary corruption could happen to pieces of disk data your program didn't even touch, if you touch data near them.

As someone who runs physical servers for hosting, you really get what you pay for.

I've had non-ECC RAM systems destroy database tables leading to data loss.

Systems with ECC are either able to correct the error (and log it), or throw the alarm bells. Even discounting RAM bit flips, simple bad RAM can destroy your data. Having an ECC aware system (including getting Linux to check the EDAC or have the baseband monitor do so) has saved me many times from failing hardware.

If you're using an AWS m2.4xlarge (score: 1511) then you pay $2.00 per hour. If you're using this Hetzner (score: 1729) you pay less than $0.09 per hour. You'd have to use very few hours indeed for AWS to be cheaper. That's not even factoring in traffic. You get 10TB per month with Hetzner which would cost you an additional $1200/month (!) with Amazon. AFAICT Amazon doesn't use ECC ram either; at least I don't see it mentioned anywhere. Hetzner does have competitively priced servers with ECC ram.
you could use the ex6:

http://www.hetzner.de/en/hosting/produkte_rootserver/ex6

a little more expensive (69 EUR) and less RAM (16 GB) but using ECC.

or someone like providerservice.com who have almost the same specs but with no setup fee.
Do you have a link to where Amazon says EC2 uses ECC RAM? I can't seem to find one.
I can't find one either. In fact, AWS doesn't seem to be running with ECC RAM except for some of the GPU instances. Happy to be corrected here.
You can get the one with ECC (EX 6) it's just 10euro extra / per month... which is still away cheeper than AWS.