| The standard deviation is simply a measure of the spread of a distribution. There's nothing wrong or right about high standard deviations. In fact, the high standard deviation means that you should expect highly variant performance. Look at the figures. The performance of Slicehost follows a sawtooth like pattern. The quantity standard deviation is useful because it quantifies what to expect. Plus or minus one standard deviation means that ~ 2/3 of the time you will fall in that range. If you think about the problem a little bit, you might be more worried about the standard deviation of the standard deviation. This, in fact, would be a useful quantity, but hard to measure. EDIT below this line -------
Several comments below have commented that SD is somehow less useful if it's "large" (or large relative to the mean, or whatever). The reason people think large SDs are indicative of a poor experiment is that in school lab classes one calculates the SD and call it the "error". The standard deviation is a measure of spread, if it's large then the spread is large. Knowing the spread has value. In this case, under the parent's experimental conditions EC2's performance is more constant than that of slicehost's. A fair critique of the blog posting is that the error on the standard deviation may be large, depending on the experimental conditions. It is _not_ a fair critique to say that the SD is too high to make a prediction, you just have larger performance spread. Note that the performance spread described is not necessarily "error". The spread is inherit to either the server (as implied by the article) or the method (in which case it is an error). |