Hacker News new | ask | show | jobs
by joeyo 1425 days ago
There are many ways to think about the Kalman filter. Here are a few that I like:

* You can think of it as a Bayesian update process for linear Gaussian systems. That is: given a prior belief of the state of a system (and an uncertainty about that belief), and a measurement about the system (and uncertainty about that measurement), the Kalman filter tells you how to combine the prior with the measurement. This is very hard to do in general, but has an exact solution if your system is Linear-Gaussian. That's magical!

* You can also think of it as a "better way to average". If I gave you two quantities that reflected some "true" value and asked you what the true value was, you would probably average them. The Kalman filter does you one better, because it tells you to average the two quantities weighted by how confident you feel about each one.

* If you like control theory, you can think of the the Kalman filter as the dual of the Linear-Quadratic Regulator. That is, the KF is the optimal state estimator for Linear Gaussian systems in the same way that the LQR is the optimal (minimum cost) controller for LG systems. It's also worth pointing out that if the system you are estimating is being controlled, the KF can incorporate control inputs as well!

4 comments

* You can think of it as a factor graph with linear residuals and Gaussian noise functions in factors that connect a chain of variables, with all but the most recent variable marginalized. It’s a well known fact that linear, Gaussian factors result in a closed-form expression that gives the optimal maximum a posteriori estimate. The Kalman filter exploits this very special case. You can also write a LQR down with a factor graph (as the parent commented, the KF and LQR are duals).
That's the same as the first point in the parent comment's list: a factor graph is a visualisation of the conditional probability distribution. But yes it is very helpful to draw out the factor graph (or Bayesian graph) for the Kalman filter, probably more useful than just writing out the equations.
By the way (as if my original comment above isn't already nitpicky enough, this is even worse...):

It bugs me when people use the word "optimal" in the Gaussian / Bayesian formulation. As the top-level comment above says, if you assume the various prior and conditional distributions are Gaussian then the posterior distribution is Gaussian too. This is not optimal, it's exact, just like you wouldn't say x=2 is optimal solution to x+1=3.

It is the optimal solution in the quadratic optimisation formulation, as the top-level comment also correctly said.

I'm not a mathematician at all (mechanical engineer), but to me, "exact" sounds like "deterministic" as an opposition to stochastic.

I though optimal conveyed the idea of "literally the best possible solution but you're still in the presence of a fully random system here".

Which might be the wrong interpretation, but hopefully it explains why some people (who aren't necessarily familiar with rigorous mathematics) use optimal.

I do see your point. But if you're talking about a probability or probability distribution, it can still be an exact solution to a model. For example, if I throw two standard dice, what is the probability of throwing two sixes? The answer is 1/36. To me, it sounds odd to describe 1/36 as the "optimal" solution to that problem, even though it's stochastic. Even "exact" solution is a bit odd, I'll concede, but a lot less so. "The solution" or "the answer", with no more qualification needed, sounds best to me.
That's a known probability distribution. The optimal is about being the minimum variance unbiased estimator for the unknown probability distribution.
1/36 is indeed the most optimal estimator of the expected frequency of two sixes, it's not odd at all.
Oh yes, of course.
I love definition 2. No buzzwords and this is what it does. Maybe it is too subtle for most.
Yes, point 2 is really the elevator pitch of Kalman filters. It enables sensor fusion, averaging a fast noisy sensor with a slow accurate sensor, or even add a model as one less confident input to the filter.

As pointed out elsewhere in this thread, demonstrating a Kalman filter with only one input doesn’t really show their real potential.

Regarding the model as an optional less confident input:

I mostly know KFs from the third approach, control theory, state observers, state space representation, where the model is a central concept.

Can you actually use KF without a model? I'm curious, would you point out some references?

Though I suspect the Kalman filter only can be described as averaging, if the distributions are mono-modal?

(Or is Kalman filter defined to only work for Gaussian distributions anyway, and otherwise you call it Bayesian updating?)

I think state of the art can work with something different than Gaussian distribution, either in the input data or the predicted one (which, with non linear models can be very unregular). Isn't that the point of the unscented kalman filter and all the ones that generate lots of hypotheses to check the target distribution. I probably don't use the correct vocabulary here... Sorry.
But doesn't that only describe one part of it. Since it also gives you a way to automatically figure out the "how confident you are" about each value over time ?
Yes I agree, it allows you to find a confidence as you go.
I'm struggling to see the advantage over the arithmetic mean if I don't have a model of the true value. How is the confidence estimated for each value? given the example from the article where the variance of the temperature sensor is considered to be constant.
Imagine you want to know the depth of liquid in a tank you're filling up - and you've got a noisy depth gauge and a noisy flow-rate meter.

If you merely take the mean of the last 10 depth gauge readings, your smoothed depth reading will always lag behind the true depth, being about 5 readings out-of-date.

By fusing together the noisy depth gauge, and the noisy flow-rate meter, and a model saying how fast depth rises with flow, you can average out the noise without creating the same level of lag.

This is useful in applications like GPS receivers - there's noise so you do need filtering, but for driving through complex junctions, the last thing you want is a 5 second delay!

If you have separate estimates / noisy measurements of the same quantity, and also have some estimate of their variance (e.g. pollsters or weather forecasters with different accuracy track records), you'd probably want a weighted mean (inversely by their variance) rather than a simple mean. If you have a system that runs for some time, you'd usually (but not always) want an average with higher weights placed on the recent past than on the distant past. (The arithmetic mean is the optimal estimate in a model where the true value is unknown but time-constant.)

The Kalman filter is "just" a recipe to calculate the weights for optimal estimation, given different model assumptions. The difficult part is not applying the formulas, but mostly in coming up with a good model of the movements of the things you want to measure/estimate.

At a high level, the Kalman filter offers two advantages over a simple arithmetic mean (which correspond to its two steps, update and observe):

* The Kalman filter can account for changes in state. This is useful the value you're measuring is changing over time (e.g. the position of a moving vehicle, or water level in a bucket being filled mentioned in another comment). The type of variation that can be handled by the Kalman filter will depend on what assumptions you make - how you configure it, if you like. Often you would account for velocity, but you can go a step further and account for acceleration too. A moving average will always lag behind a bit, even when movement is totally linear and there is no error in the measurement at all.

* A Kalman filter effectively gives more weight to recent measurements and less weight to older ones. e.g. a moving average with period 4 will have weights of (..., 0, 0, 0.25, 0.25, 0.25, 0.25), whereas a Kalman filter might effectively have weights of (..., 0.125, 0.25, 0.5). You can even have a continuously-varying time gap between measurements, so if a new measurement comes in then it will update the estimate a lot if the previous measurement was a long time ago, whereas if the previous measurement was extremely recent then it will effectively be averaged with the new one. One downside of this is that if the state changes a lot very suddenly then the Kalman filter will remember the old state, to some extent, whereas a moving average will forget it entirely once it drops out of the window; but this is offset to some extent by the previous point.

If you don't have a model of the various confidences then you'll need to guess them - if you do a really bad job then it might work out worse than a moving average. But if the Kalman filter's working better then you can always make use of the state estimate while ignoring the computed uncertainty.

i've always just seen it as a wrapper for updating multivariate linear models when the inputs have noise that can be modeled with gaussians.

linear models meaning anything that can be described as a matrix vector multiply. (which is a lot)

but i'm no expert.