Hacker News new | ask | show | jobs
by rsp1984 520 days ago
Always telling this whenever the topic of Kalman Filters come up:

If you're learning the Kalman Filter in isolation, you're kind of learning it backwards and missing out on huge "aha" moments that the surrounding theory can unlock.

To truly understand the Kalman Filter, you need to study Least Squares (aka linear regression), then recursive Least Squares, then the Information Filter (which is a different formulation of the KF). Then you'll realize the KF is just recursive Least Squares reformulated in a way to prioritize efficiency in the update step.

This PDF gives a concise overview:

[1] http://ais.informatik.uni-freiburg.de/teaching/ws13/mapping/...

10 comments

I appreciate you taking the time to help people understand higher level concepts.

From a different perspective... I have no traditional background in mathematics or physics. I do not understand the first line of the pdf you posted nor do I understand the process for obtaining the context to understand it.

But I have intellectual curiosity. So the best path forward for me understanding is a path that can maintain that curiosity while making progress on understanding. I can reread the The Six (Not So ) Easy Pieces and not understand any of it and still find value in it. I can play with Arnold's cat and, slowly, through no scientific rigor other than the curiosity of the naked ape, I can experience these concepts that have traditionally been behind gates of context I do not possess keys to.

http://gerdbreitenbach.de/arnold_cat/cat.html

You aren’t supposed to understand things if you don’t know about them. That’s how it works.
With no mathematical rigor there is no mathematical understanding. You are robbing yourself, as the concepts are meaningless without the context.

Truly appreciate the power of linear approximations by going through algebra, appreciate the tricks of calculus, marvel at the inherent tradeoffs of knowledge with estimator theory, and see the joy of the central limit theorem being true. All of this knowledge is free, and much more interesting than a formal restatement of "it was not supposed to rain, but I see clouds outside, I guess I'll expect light rain instead of a big thunderstorm".

> With no mathematical rigor there is no mathematical understanding. You are robbing yourself, as the concepts are meaningless without the context.

I will think more about this, but I'm not sure I agree. I have enjoyed reading Feynman talk about twins and one going on a supersonic vacation without understanding the math. Verisimilitude allows a modeling of understanding with a scalar representation of scientific knowledge, so why not?

Of course I would like to understand the math in its purest forms–just the same as I wanted to read 1Q84 in Japanese to be able to fully experience it in its purest form, but my life isn't structured in a way were that is realistic even if the knowledge of the Japanese language is free.

> Truly appreciate the power of linear approximations by going through algebra, appreciate the tricks of calculus, marvel at the inherent tradeoffs of knowledge with estimator theory, and see the joy of the central limit theorem being true.

I can't even foil so the journey toward understanding can feel unattainable in the time resources I have. This absolutely may be a limiting belief, but the concept of knowledge being free ignores the time cost for some exploring these outside of academia or professional setting.

Indeed everything has an opportunity cost, and every life has its own priorities.

Since you mention Feynman, I would like to observe that many expositors who target the lay audience have the skill of making the audience believe that they have comprehended(1) something of an intellectual world that they have no technical grounding to truly comprehend(2). In my view these are two distinct types of comprehension/understanding. So long as the audience is clear on which type of understanding they are getting, and is not wasting time unwittingly pursuing one type at the expense of the other then I see no harm.

There is a risk however, that the pop expositors will put you in a headspace where even if you are faced with accessible, but type 2, material you will not be familiar with what really constitutes understanding. As a mature age student it took me quite a few years of maths exams to switch from 1 to 2. Nowadays I am more comfortable with admitting that I don't understand some piece of math (for that is the first step on the path to learning) than being satisfied with a pop-expository gist.

I've thought a lot about this exact topic. You need both to do well.

You need handwavy and vague versions of things to understand the shape of them and to build intuition.

Then you need to test the intuition and build up levels of rigor.

Especially in the context of the Kalman Filter. I just helped a bunch of middle school students build a system for field localization and position tracking. They don't have all kinds of knowledge. They don't have linear algebra or a real understanding of something being gaussian and have to have a bazillion variables. They understand that their estimates and the quality of stuff coming off their sensors have different qualities based on circumstances, and that gain needs to vary. They'll never hit the optimum parameters.

But: their system works. They understand how it works (even if they don't know how to quantify how well it works). They understand how changing parameters changes its behavior. When they learn tracking filters and control by root locus and all kinds of things later, they'll have an edge in understanding what things mean and how it actually works. I expect their intuition will give them an easier time in tackling harder problems.

Conversely, I've encountered a bunch of students who know what "multimodal" means but couldn't name a single example in the real world of such a thing. I would argue that they don't even know what they're talking about, even if they can calculate a mixture coefficient under ideal conditions.

There's a lot of fluffly language here that isn't saying much.

Linear algebra is not something that takes years of patient study to gain basic competency. It had almost no prerequisites and can be understood enough to understand least squares in a focused weekend or two.

Thank you for the encouragement. I'll will take a week or two and spend some time with some focused learning. Do you have any recommendations where to start?
> With no mathematical rigor there is no mathematical understanding

While I appreciate rigor to really know deep details, is not only not a requirement for understanding, but a hurdle. A terrible insurmountable hurdle.

To first have understanding, I need some kind intuition. Some explanation that makes sense easily. That explanation is btw, what typically the inventor or discoverer had to begin with, before nailing it down with rigor.

> With no mathematical rigor there is no mathematical understanding. You are robbing yourself, as the concepts are meaningless without the context.

You don't need to know what Gravity is to calculate the time it takes for an apple to fall from a tree. You just need to accept that g=9.8m/s2.

You also don't need to understand the chemistry of flour, salt, sugar, sodium, milk and eggs to bake a cake.

> Truly appreciate the power of linear approximations by going through algebra, appreciate the tricks of calculus, marvel at the inherent tradeoffs of knowledge with estimator theory, and see the joy of the central limit theorem being true.

None of these are needed, or even useful, for understanding the Kalman filter.

I think the easiest way depends on your background knowledge. If you understand linearity of the Gaussian distribution and the Bayesian posterior of Gaussians, the Kalman filter is almost trivial.

For (1D) we get the prior from the linear prediction X'1 = X0*a + b, for which mean(X'1) = mean(X0)*a + b and var(X'1) = var(X0)*a^2, where a and b give the assumed dynamics.

The posterior for Gaussians is the precision weighted mean of the prior and the observation: X1 = (1 - K)*X'1 + Y*K, where the weighting K = (1/var(X'1))/(1/var(X'1) + 1/var(Y)), with Y being the Gaussian observation.

Iterating this gives the Kalman filter. Generalizing this to multiple dimensions is straightforward given the linearity of multidimensional Gaussians.

This is how (after I understood it) it makes it really simple to me, but things like linearity of (multidimensional) Gaussians and the posterior of Gaussians as such probably are not.

I have written down a similar derivation here if anyone is interested: https://ngr.yt/blog/kalman/
What you write is simple. But your scalar model suppresses the common situation of a measurement matrix with output dimension less than state dimension. Exactly how the Kalman gain formula works under this setting I'm less clear on. Beyond that, additional insight is needed when the measurement matrix is non-linear and K = P_xy P_y^{-1} as in the UKF. At least I get stuck there, with little formal statistics work.
Good catch, indeed a measurement matrix is needed if the state and measurement are of different dimensions or require a (linear) transformation. For that use Y = H*z where H is the measurement matrix and z is the observation vector.

For UKF the Y is still a multidimensional Gaussian and computing K is the same. The mean and covariance of Y is computed from Z and the nonlinear measurement function using the unscented transform.

You can keep telling this, but this “esoteric” math is often too much for the people actually implementing the filters.
It's bread and butter math for physics, Engineering (trad. Engineering), Geophysics, Signal processing etc.

Why would anyone have people implementing Kalman filters who found the math behind them "esoteric"?

Back in the day, in my wet behind the ears phase, my first time implementing a Kalman Filter from scratch, the application was to perform magnetic heading normalisation for on mag data from an airborne geophysical survey - 3 axis nanotesla sensor inputs on each wing and tail boom requiring a per survey calibration pattern to normalise the readings over a fixed location regardless of heading.

This was buried as part of a suite requiring calculation of the geomagnetic reference field (a big paramaterised spherical harmonic equation), upward, downward and reduce to pole continuations of magnetic field equations, raw GPS post processing corrections, etc.

where "etc" goes on for a shelf full of books with a dense chunk of applied mathematics

FWIW, I think I understand Kalman filters quite well, but the linked PDF is hard for me to follow, and I'd really struggle to understand it if I didn't already know what it's saying.

I think the lesson there is that the Kalman filter is simpler in the "information form" where the Gaussian distribution is parameterized using the inverse of the covariance matrix.

If you don't already know what that means, you likely don't get much out of that. I think the more intuitive way is to first understand the 1D case where the filter result is weighted average of the prediction and the observation where the weights are the multiplicative inverses of the respective variances (the less uncertainty/"inprecision", the more you give weight).

In the multidimensional case the inverse is the matrix inverse but the logic is the same.

More generally the idea is to statistically predict the next step from the previous and then balance out the prediction and the noisy observation based on the confidence you have in each. This intuition covers all Bayesian filters. The Kalman filter is a special case of the Bayesian filter where the prediction is linear and all uncertainties are Gaussian, although it was understood this way only well after Kalman invented the eponymous filter.

Not sure how intuitive that's either, but don't be too worried if these things aren't obvious, because they aren't until you know all the previous steps. To implement or use a Kalman filter you don't really need this statistical understanding.

If you prefer to understand things more "procedually", check out the particle filter. It's conceptually the Bayesian filter but doesn't require the mathematical analysis. That's the way I really understood the underlying logic.

I understood it as reestimation with a dynamic weight factor based on the perceived error factor. I know it’s more complex than that but this simplified version I needed at one point and it worked.
I found this article invaluable for understanding the Kalman filter from a Bayesian perspective:

Meinhold, Richard J., and Nozer D. Singpurwalla. 1983. "Understanding the Kalman Filter." American Statistician 37 (May): 123–27.

You are probably right, but many folks following your advice will give up halfway through and never get to KF.
This is more or less the approach that is taken by Dan Simon's "Optimal State Estimation" book that I came here to recommend: https://academic.csuohio.edu/simon-daniel/state-estimation/ All the prerequisites are covered prior to introducing the Kalman filter in chapter 5. Although Simon does not go through the information filter before introducing the Kalman filter, he discusses it later.

However, to understand recursive least squares, in particular the covariance matrix update you're going to need a firm grounding in probability and statistics. Simon makes the case that probability theory is a less strict pre-requisite than multiple-input-multiple-output (state space) linear systems theory (for which I can recommend Chen's "Linear System Theory and Design").

So I would argue that to understand Kalman filters you need to know state space systems modelling, both continuous time and discrete time discretisation methods (this provides the dynamics that describe the time-update step), plus you need to know enough multivariate statistics to understand how the Kalman filter propagates the gaussian random variables (i.e. the Kalman state) through the dynamics and back and forth through the measurement matrices.

That’s the one should learn any subject—-be it physics, chemistry, math, etc. However, textbooks don’t follow that technique.
I strongly recommend Elements of Physics by Millikan and Gale for anyone who wants to learn pre-quantum physics this way.
Are you me? I feel like I say this every time too! Perfectly captured.
Indeed. I always recommend Time Series Analysis by Hamilton for this reason. KF comes up as a natural way to solve linear models.
(Laughs in control theory)