Hacker News new | ask | show | jobs
by nicops 2875 days ago
That's all fine and dandy, but conceptually it's wrong to say the average of an array is 0, and can and will lead to wrong results in a variety of cases. I'm sure you can think of a lot of these cases yourself. I think in the history of computer science we programmers have found that there are a lot of convenience shortcuts that make sense in a lot of cases but bite our asses in other. Implicit is fast and fun, but it's nice to have your seatbelt on when the car crashes. Going back to the average case, if you want an average function that returns 0 on empty arrays, fine. But that's not the average function, and you shouldn't call it that way, and names matter, you should call it averageOrZero or something like that.
1 comments

Why is it conceptually wrong to say the average of an empty array is zero? My undergrad degree is in pure math and my grad degree is in mathematical statistics and I’ve never heard an idea like saying the mean of an empty array is zero is “conceptually wrong.”

You bring up the history of CS, but even there you have debates about what convention to use for defining 1/0 for function totality and theorem provers.

There’s no aspect of pure math derivation of number systems on up through vector spaces that definitively makes a zero mean for an empty array ill-defined. Whatever choice you make, positive infinity, undefined, 0, or any finite values, etc., any such choice is purely down to convention that depends precisely on your use case.

> Why is it conceptually wrong to say the average of an empty array is zero?

It’s not conceptually wrong, it just means the “mean” you’re referring to calculates a different value than the “mean” we’re taught in school. So, underlying assumptions about the differences in “mean” should be communicated where it’s used.

Sure, I agree they should be communicated. Like, in the docs for “standard” mean functions, and not pushed into “specialized” mean functions, since needing this particular convention is not remotely special, and is rudimentary and expected in 99% of linear algebra and data analytics work, which are the largest drivers of these types of statistical functions.