Hacker News new | ask | show | jobs
Derive Yourself a Kalman Filter (ngr.yt)
127 points by NougatRillettes 2594 days ago
7 comments

"It's really quite simple. Here, let me show you all these formulas..."

This reminds me of last week. I bought a house and want to get into woodworking so I looked up intro videos on YouTube. "It's easy. Just follow me over to this table saw and router and planer and all these other tools you don't own."

I know I'm not being fair. But a recurring frustration I have is when experts claim it's easy or simple or for beginners and then talk right over you. They don't mean to, but it can be insulting and demoralizing. "well if it's for beginners and I don't know what these glyphs mean, the problem must be me."

So back to the topic. Am I wrong or could you begin with, "a kalman filter is a way to get a guesstimate of a value from different sources where the trust in each source can vary."

There are no Kalman filters for beginners because no one is out there trying to make maths more complicated than it needs to be.

To understand what’s happening you need work linear algebra l, probability theory and distribution theory. There’s no way to explain it without because it makes no sense without. Kalman filters are born of those things.

You could, but you should follow it up with "where one of those sources is your (incomplete, noisy) information about what the value _used_ to be and how it changes over time".

If you just have (say) three different ways of estimating the position of an aeroplane, then you don't need a Kalman filter. The place where the Kalman filter adds value is where you measure its position _repeatedly_ and make use of the fact that it's known to be moving approximately in a straight line.

How do EKF and ensemble Kalman Filters fit into this?
The Kalman filter assumes that all errors are gaussian, that all updates are linear (i.e., new state = linear function of old state), and that observations are linear (i.e., what you measure is a noisy version of a linear function of the state).

The extended Kalman filter allows for state updates and observations to be nonlinear, by the straightforward expedient of replacing them with linear approximations near to the current estimate.

The ensemble Kalman filter also allows for nonlinear updates and observations; instead of keeping track of the expectation and covariance of the state (i.e., everything you need to define a Gaussian model, things that behave nicely under linear transformations but not under nonlinear ones) it keeps track of an "ensemble" of sample values, applies the update and observation functions to those, and then estimates expectations and covariances from this ensemble. It's a sort of Monte Carlo Kalman filter.

A couple of other things worth knowing about:

Intermediate between the extended KF and the ensemble KF is the "unscented Kalman filter". Like the ensemble KF it estimates things using a number of samples; but instead of propagating those samples through repeated steps, it picks the sample points at each step on the basis of the estimated expectation and covariance, and uses them only to compute new expectations and covariances. More expensive than the EKF but copes better with substantial nonlinearities.

Extrapolating beyond the ensemble KF is the "particle filter", which uses the same "track an ensemble of samples" approach but gives up the assumption that all the errors are Gaussian. You need a larger ensemble to get good results, I think, but it can cope with a wider range of scenarios. (I find the name "particle filter" annoyingly distracting; the "particles" are the samples, which I guess you're supposed to think of as a cloud of points in possible-configuration-of-the-system space, and of course it's a "filter" in the same way as the Kalman filter is.)

You may have chanced upon this already but here is a wonderfully explained (IMHO) intro to Kalman filters: https://www.bzarg.com/p/how-a-kalman-filter-works-in-picture...
I don't know man, the link you clicked on says "Derive Yourself a Kalman Filter", so complaining about a formal treatment seems misguided rather than simply unfair.
you are assuming that the reader is familiar with this specific understanding of "derive". someone not familiar with formal methods wouldn't.
what other understanding of derive is there???
Derived data - e.g. indexes or summaries.

Derivative financial instruments - e.g. options, CDSs.

I'm sure there are many more examples.

You could, kind of. Basically, Kalman filter is this: you have 2 noisy data sources with a known real-life connection (like, vehicle position and velocity), so instead of an accurate measurement for each you have 2 "wide" gaussians. When you multiply them, you still get a gaussian, but a much narrower one, meaning: you still don't know the exact position for sure, but (kind of surprisingly) now you know it with a much greater precision than you had with the direct measurement.

In fact, I agree that this one is not really good explanation. I cannot remember, where I saw my favourite one, but there are really plenty of them in the internet, many are well illustrated and pretty clear overall, so if you are actually interested, I think you won't have much trouble understanding it after a few links from the first page of your favorite search-engine.

Hi,

Trying to simplify even further: you don't need 2 sources, just one, for which the distribution of noise is gaussian. And a model of how the data should evolve over time. Straight-line constant-speed for example, but also, static, turning, accelerating/decelerating... You need to be able to predict the next measurement from the previous one. The Kalman filter then uses the difference between the prediction from the model and the measurement, the incertitude on the measurement, and gives you: a smoothed estimation of the 'real' position of the object, an estimation of the 'noisyness' of your input data (i.e. How much does the data fit the model), and the ability to predict next positions accurately...

Well, not sure this word-soup simplifies anything...

Kalman filters are a bit counter intuitive if you don't have a strong background in probabilities... They clicked for me when 1- shown in comparison with other simpler filters 2- I played with each dimension of the formula on a toy example...

They're also very fun because they can be used in so many ways (sensor data fusion, non gaussian noise models, incredibly complex trajectory models...).

"it can be insulting"

Insulting, really? Somebody cares to try to explain you something and the word you can find is "insulting"?

It's like everyone owes it to you to chew knowledge in small enough bits that you can swallow them without any effort.

What a kindergarden.

Well, I dont agree with OP but i think you miss the mark. it's not deigning to provide an explanation that would be insulting, but the (apparent) promise of it being an accessible and intuitive explanation, and quite obviously not being one that OP considered insulting.
I understood OP exactly as you said. This doesn't justify OP's whining, not one bit.
A Kalman filter is a specific instantiation of the Bayes filter, which is very elegant and intuitive. Then once you plug in a bunch of Gaussians and linearity in the Bayes filter and crunch the math, a Kalman filter falls out.

This is how it's treated in the book Probabilistic Robotics by Thrun, Burgard, Fox and I found it to be one of the best treatments on Kalman filters. A really great book overall.

Yeah, this piece reads like the Wikipedia page for Kalman filters. It’s right, but it’s not very informative if you don’t already understand the topic. There’s definitely no attempt to simplify anything here.
Yes, when people try to explain things they will often say it’s easy in hopes that it will help the learner relax and not feel overwhelmed.

But that backfires the moment something is not easy for the learner. It’s completely demotivating.

It’s better in my opinion to say, “it’s okay if you don’t understand everything at first.” If the goal is to keep the learner from being immediately discouraged.

There's a place for both kinds of explanations. This is a good way of explaining what a Kalman filter is to someone who has a mathematical background, just like if you go look up a woodworking video, it might show an interesting way for someone who is familiar with woodworking to build a chair.

Neither is useful to the novice, but an article like this going out of its way to explain, say, matrix notation, or a video about building chairs that spends the first ten minutes explaining how to use a table saw, is less useful and interesting to the experienced reader or viewer.

I suppose I agree that articles like this shouldn't be presented as easy, or for the beginner, but there's a difference between "simple" and "easy".

When it comes to the woodworking: a japanese pull saw ( https://en.m.wikipedia.org/wiki/Japanese_saw ) is a great and fairly cheap way to get started. It enables high accuracy and is still quite fast.

Then you'll need a cheap but sturdy table, a hand plane, some vices, a tape measure, a carpenter's square, a cheap cordless drill. Screwing and glueing is the best way to get solid joints for a novice.

Thanks for your comment (and thanks to everyone who answered below). I have been way too many times in the same situation where someone says something is "trivial" or "easy" so I completely get your point and will try not to make the same mistake for the next posts.

Regarding your question, I think the answers you have had are on point!

If you enjoyed this, you might also like this article from my blog:

http://heinrichhartmann.com/blog/2014/12/11/Generative-Model...

This is a study of generating time series/stochastic processes. The estimators for the parameters lead straight up to Kalman filters. The state space models are taken from the Kalman setup. This was the first time I understood how Kalman filters come about. Its really a three step process:

1. Stationary processes --> Classical parameter estimation.

2. Discrete state space --> Markov models

3. Continues state space --> Kalman filters.

Make sure to read this bit to make the formal explanation easier to understand.

https://aguaviva.github.io/KalmanFilter/KalmanFilter.html

This one is great to me, less formula and more drawings :)

https://www.bzarg.com/p/how-a-kalman-filter-works-in-picture...

A few months ago I try to use Kalman Filter as one of the filters to stabilize the GPS from mobile phones due to various factors: loss of signal, jumpy stationary coordinate, fake GPS, etc. I found that Kalman Filter was not giving me satisfactory result, adjusting its parameters many times without further improvement. Finally, I just need to write my own simple filter, without fancy math filters (kalman/etc) as I can differentiate the speed of car/motorbike/pedestrian, distance of jumpy signal (less than 20 meters in four directions), accuracy of gps signal in the city/suburb, etc.
I think there is no such thing as a conditional random variable. There are conditional probability distributions, but I've never heard of A|B itself being a random variable and I think it doesn't fit the definition. A random variable (I'm not a mathematician) is a function that takes an elementary event and returns the value of the variable.

You may say this is pedantry, but I think it's important to keep track of what is what, especially when being a beginner. You can afford to be sloppy once you're more advanced.

Hm, I googled it now, the issue is discussed in detail here: https://math.stackexchange.com/questions/612468/how-to-forma...
Thanks for your comments! I don't see how what is discussed here conflicts with the notation I introduced into the post, do you still believe there is a soundness issue in what I have written?
I'm just saying that the notation of say A * X|Y * B seemed unfamiliar to me. I only know conditional notation within a P(...). Or an expectation, etc. Apparently your way of writing is used by others as well, but it may be good to know that it is not fully rigorous.

Again, there are different people preferring different presentations. I as a student was often frustrated by abused notations and was often confused by such things when trying to understand something in detail. For a more cursory and "practical" understanding it could be good enough.

> it may be good to know that it is not fully rigorous

What is the problem with A|B=b being a random variable? (Apart from you unfamiliarity with the concept, I mean.)

Edit: I don’t say there are no problems, I ask what do you think the problem is? There is no problem in the discrete case. In the continuous setting things are indeed more complicated (but if the limiting process is well defined there are no issues).

Note that the same lack of rigour that you find in conditional ramdom variables affects conditional probabilities. If you can accept the latter there is no reason to reject the former.

In my very ignorant world view Kalman filters make some prediction from some input. As do various ML techniques. As do various statistical models and techniques. What are some good sources that show the connection between all these techniques and help me pick the right one for specific use cases?

For example, I want to predict a time series let's say the number of visitors of a site. I know some characteristics of the series (periodic, seasonal), but how should I go about it?

> In my very ignorant world view Kalman filters make some prediction from some input.

The word "filter" indicates that it turns one sequence (usually a time series of measurements) into another sequence (an estimate for underlying states). You can think of it as denoising the sequence of measurement by using knowledge about how the underlying system behaves. For example, if our GPS measurement says we suddenly jumped 100 meters compared to a second ago, we can weigh this against our prior knowledge that a car (the underlying system) is not likely to make such a sudden position change.

The Kalman filter weights the incoming measurement against what our model would predict. Both the prediction and the measurement are probabilistic and they are weighted according to the uncertainties of them. The more certain source of information is weighted higher.

> For example, I want to predict a time series let's say the number of visitors of a site. I know some characteristics of the series (periodic, seasonal), but how should I go about it?

That's not what the Kalman filter is for as you are not trying to denoise some sequence of noisy measurements.

Most statistical time series models (e.g. ARIMA) models can be obtained using Kalman filters, if you fit them automatically, rather than choose the p-d-q values by hand. That’s how the auto.arima function in R works.

https://stats.stackexchange.com/questions/316676/arima-vs-ka...

> In my very ignorant world view Kalman filters make some prediction from some input.

This doesn't seem like an accurate view of what Kalman filters do.

Fundamentally, they just combine noisy measurements over time to produce a more precise estimate of what the measurements should be.

There is a prediction step, but that is based on an existing model of how the state should evolve over time.