Hacker News new | ask | show | jobs
by menscher 2067 days ago
The trend-lines were really just to guide the eye (the text gives this context), but if you really must know: the R^2 for all three fits was above 0.9, suggesting the exponential growth model is reasonable.

As you say, the pps and bps don't show much curvature, so a linear fit could indeed work for them for the displayed portion of the graph. But it's non-sensical when you look further back in time... predicting negative attack volumes prior to mid-2011. ;)

5 comments

> the R^2 for all three fits was above 0.9, suggesting the exponential growth model is reasonable

What is the R^2 for other fits? Say linear or quadratic (the most obvious alternative choices)?

> it's non-sensical when you look further back in time

Of course, because the factors that are at work in producing the data are not constant in time. So there is no reason to expect a single curve fit with a single set of parameters to be applicable for all times.

The criticism that it would go negative in the past is meaningless. Of course it’s unphysical, but that doesn’t mean a linear model isn’t appropriate for today’s data.

You would just choose a point of time in the past where you believe the model is inapplicable. Maybe it was a different linear model then, or maybe mostly constant, who cares, we’re modeling today’s range of data, which might be well explained by a linear model, doesn’t really matter what 2011’s data was doing unless we separately believe the same growth law had to apply across all the years, and I see no reason why.

> The trend-lines were really just to guide the eye

That's the problem, no? You can draw a line/curve over any data and convince a lot of people that it's relevant, but that's more cognitive bias than anything.

Perhaps here it would be more advisable to just describe the data, and leave inference to the reader.

Prediction is hard, especially about the future ;)

Another alternative model besides linear would be quadratic (or another X^(1+epsilon) polynomial where epsilon is small). This would avoid the problem of negative data and likely fit the data better than an exponential.
But why?

I think the question is really about the growth of volume of connected, compromised devices. Growth curves are often sigmoid shaped, meaning, exponential until they're not. The exponential is often great for modeling growth trends up until the plateau, but it's hard to know when the corner will turn.

Exponentials are also well motivated by differential equations... (Say, if you're modeling growth of IOT devices based on word of mouth.) Polynomials with degree 1+epsilon, less so.

Exactly. I wonder why some people seem to want other equations instead, how is that better