Hacker News new | ask | show | jobs
by kloch 1256 days ago
> if you add very big values to very small values, you can get inaccurate results (the small numbers get lost!)

There is a simple workaround for this:

https://en.wikipedia.org/wiki/Kahan_summation

It's usually only needed when adding billions of values together and the accumulated truncation errors would be at an unacceptable level.

2 comments

It can also come up in simple control systems. A simple low-pass filter can fail to converge to a steady state value if the time constant is long and the sample rate is high.

Y += (X-Y) * alpha * dt

When dt is small and alpha is too, the right hand side can be too small to affect the 24bit mantissa of the left.

I prefer a 16/32bit fixed point version that guarantees convergene to any 16bit steady state. This happened in a power conversion system where dt=1/40000 and I needed a filter in the 10's of Hz.

This is a very important application and a tougher problem than most would guess. There is a huge variety of ways to numerically implement even a simple transfer function, and they can have very different consequences in terms of rounding and overflow. Especially if you want to not only guarantee that it converges to a steady-state, but furthermore that the steady-state has no error. I spent a lot of time working on this problem for nanometre-accurate servo controls. Floating and fixed point each have advantages depending on the nature and dynamic range of the variable (eg. location parameter vs scale parameter).
> It's usually only needed when adding billions of values together and the accumulated truncation errors would be at an unacceptable level.

OTOH, it’s easy to implement, so I have a couple of functions to do it easily, and I got quite a lot of use out of them. It’s probably overkill sometimes, but sometimes it’s useful.