Hacker News new | ask | show | jobs
by conistonwater 4093 days ago
Can somebody weigh in on their claims of performance differences? IIUC that's the strongest argument so far against sequential consistency by default, yet I'm not sure I understand their evidence. I haven't followed their references yet, but after reading their arguments I'm not sure just what the overhead is now, and what they expect it to be in the future (tbh, I would have expected them to be much clearer about this point, since it is so crucial). They say they expect the overhead to be reduced substantially to the point of not being worth caring about, but is that actually true/likely? I'm not familiar with this enough to judge on my own. My suspicion here is that they (cheekily) say that the overhead can be reduced, without having to prove that it will be reduced.

The alternative, if it is genuinely more expensive to implement SC guarantees in hardware, is that we simply stop teaching people that "A;B" means A is executed and then B. Maybe there really should be an exception saying "but nobody is allowed to look at any intermediate states, unless explicitly allowed". We could also just teach the full meaning of it from the start, it can't be that difficult. Their argument seems to be that non-SC is much less convenient, which I agree with.

On a scale from plenty-real to not-real-at-all, how real are the hardware performance limitations exactly?

1 comments

[I am one of the authors of the paper.]

The paper reports overhead numbers from existing research. For instance, see Figure 18 in http://arxiv.org/abs/1312.1411, which shows the cost of SC for memcached - 1% on x86 and 3% on ARM.

This overhead is primarily due to the cost of fences on existing hardware. What we (not so cheekily) say is this is likely to get better as hardware platforms provide better implementations for fences.

Hi, thanks for replying.

> The paper reports overhead numbers from existing research. For instance, see Figure 18 in http://arxiv.org/abs/1312.1411, which shows the cost of SC for memcached - 1% on x86 and 3% on ARM.

But that's the bit I don't find nearly convincing enough. You say (p.5) that you're going to "rebut these arguments" that "SC is too expensive", but the main figure of 1%/3% is from a non-standard research-level static analysis tool that, if I read that paper correctly, works on codes up to 10k LOC and takes a few minutes to run, producing the 1%/3% figure. Can that really be generalised? I'm not quite sure. The other tools in comparison did much worse, which I think may be closer to what one would get in practice. So I think that's not a good rebuttal: if you consider the tools actually available SC may well be too expensive.

I'm not saying you're wrong, just that I don't think you've proven your case very clearly. I was kind of expecting a much clearer rebuttal than I found, sorry about the snark.

yes, you did read correctly. The 1-3% does assume a non-standard whole-program analysis. Something more practical on existing hardware will look more like the E numbers (for escape analysis): 17.5% on x86 and 6.8% on ARM. An even dumber analysis (H) adds up to 37.5% overhead on x86.

It is important to realize that the overhead numbers are not huge, like 5x or 20x, to simply write SC off.

As we say in the paper, these overheads, however small they might be, will be unacceptable for low-level libraries written in C/C++ programs. The main argument of the paper was that these overheads are acceptable for "safe" languages like Java/C# which anyway add other (useful) overheads such as garbage collection, bounds checking etc. to help out the programmer.

Even for C/C++, it will still make sense to have SC by default - the programmer is responsible for explicitly turning off SC semantics in parts of the code where the inserted fences hurts performance. This is a much better tradeoff - safe by default and performance by choice.