Hacker News new | ask | show | jobs
by LeonB 410 days ago
An “average” (whether a mean/ median etc) is a very lossy compression algorithm.

You’re attempting to describe a whole series of numbers with just one (or two) numbers.

Trying to come up with a good general purpose way to reduce/compress/aggregate data via a lossy algorithm is intractable.

While that all might sound obvious, it can be very hard to internalise this.

(And that’s before getting into the motivated reasoning that biased actors [aka normal people] will use to preference one lossy algorithm over another)

1 comments

You can use an increasing number of statistical moments.

https://en.wikipedia.org/wiki/Moment_(mathematics)

The arithmetic mean is one of them, which would be an argument in favor of it.

I don't think it is an argument in favour of it.

arith-mean = E[x] , the first moment of x geo-mean = exp(E[log x]) , so log geo-mean = first moment of log x

They are both equivalent in amount of information preserved, but arithmetic preserves additive structure whereas geo preserves multiplicative structure

I didn't want to imply that there is a loss of information. Yes, it's one level up in the hyperoperation chain, I suppose, but what I meant is that it's not a typical way of doing statistics, especially the higher moments.