| HN Mirror

Y	Hacker News new \| ask \| show \| jobs

by Ragib_Zaman 2651 days ago

An 'online' mean calculation might resolve some of the common issues. Something like -

  def mean(ls):
      mu = 0.0
      for i, x in enumerate(ls):
          mu = i/(i+1) * mu + x/(i+1)
      return mu

4 comments

srean 2651 days ago

Even better if you can afford to sort the data and use a high precision type for mu. The algorithm will not be online any more and for python real numbers are usually double in practice. But in a constraint system where one has to work with float32, just using float64 for mu helps.

link

TAForObvReasons 2651 days ago

    mu = mu + (x - mu) / (i+1)

This is more natural when written in C:

    mu += (x - mu) / (i+1)

link

xurukefi 2651 days ago

How much precision loss is to be expected here?

link

Ragib_Zaman 2651 days ago

Good point. The precision loss will be pretty bad if the numbers added above have greatly different magnitudes, which could occur if the next number (x) is very different to the current mean (mu), or if the list is very long. The first issue could be partially mitigated by sorting the list first. If the list is very long, the above algorithm will necessarily add numbers of quite different magnitudes, but employing something like Kahan summation - https://en.wikipedia.org/wiki/Kahan_summation_algorithm , could partially mitigate that issue as well.

link

kevin_thibedeau 2651 days ago

It is proportional to the number of terms in the sum. Kahan summation has a tighter error bound.

link

ken 2651 days ago

How is that different from the second example?

link

antirez 2651 days ago

That you don't have to know the number of elements to sum at start. But I think the precision loss is even greater this way.

link