Hacker News new | ask | show | jobs
by daniel-levin 4694 days ago
Agreed. I'm studying statistics at the moment and I'm continually reminded of how easy it is to choose the wrong model / distribution and be incorrect because of some non-obvious and technical reason. For example, just the other day, I wanted to use the binomial distribution to solve a problem. To use this distribution, the trials must be independent of one another. In that particular problem, there was a subtle condition that made the trials non-independent. I arrived at correct-appearing answers (0 <= P <= 1) that were actually all wrong. Statistics is way too easy to break to be used naively.
1 comments

> Statistics is way too easy to break to be used naively.

Fair enough, but the same argument could be made about using an unskewed standard distribution on non-symmetrical datasets, a common error even among people who should know better.

I think binomial functions should be included, on the ground that they're very useful and their probability of misuse is only equal to the continuous statistical forms, not more so.

Hell, sometimes they use a dictionary when they should be using a list. Almost everything can be used wrongly by a begginer, which doesn't mean it shouldn't be there.

I think having a basic stats module always handy would be very convenient.

> I think having a basic stats module always handy would be very convenient.

I absolutely agree. My only point was that these tools are sometimes misapplied, not at all to argue that they shouldn't be readily available. They should be.

While you are correct, it sounds like his point was that even to someone moderately skilled it is easy to make a mistake that makes your work __completely__ invalid, rather than merely inefficient or too-complicated.