Hacker News new | ask | show | jobs
by girzel 856 days ago
No thread on Kalman Filters is complete without a link to this excellent learning resource, a book written as a set of Jupyter notebooks:

https://github.com/rlabbe/Kalman-and-Bayesian-Filters-in-Pyt...

That book mentions alpha-beta filters as sort of a younger sibling to full-blown Kalman filters. I recently had need of something like this at work, and started doing a bunch of reading. Eventually I realized that alpha-beta filters (and the whole Kalman family) is very focused on predicting the near future, whereas what I really needed was just a way to smooth historical data.

So I started reading in that direction, came across "double exponential smoothing" which seemed perfect for my use-case, and as I went into it I realized... it's just the alpha-beta filter again, but now with different names for all the variables :(

I can't help feeling like this entire neighborhood of math rests on a few common fundamental theories, but because different disciplines arrived at the same systems via different approaches, they end up sounding a little different and the commonality is obscured. Something about power series, Euler's number, gradient descent, filters, feedback systems, general system theory... it feels to me like there's a relatively small kernel of intuitive understanding at the heart of all that stuff, which could end up making glorious sense of a lot of mathematics if I could only grasp it.

Somebody help me out, here!

7 comments

Incidentally this is why people miss the mark when they get mad about mathematicians using single letter variable names. Short names let you focus on the structure of equations and relationships, which lets you more easily pattern match and say "wait, this is structurally the same as X other thing I already know but with different names". It's not about saving paper or making it easier to write (it is not easier to write Greek letters with super/subscripts in LaTeX using an English keyboard than it would be to use words). It is about transmitting a certain type of information to the reader that is otherwise very difficult to transmit.

While it uses letters so it looks vaguely like writing, math notation is very pictorial in nature. Long words would obscure the pictures.

I disagree. Single letter variables are meaningless. In order to get the big picture, you have to remember what all those meaningless letters stand for. Using meaningful variables would make this easier.
If you work with them long enough it becomes second nature to read them, and then it is easier to manipulate and compose them. The rest of the context is the background knowledge to understand the pithy core equations. Papers are for explaining concepts, equations are for symbolic manipulation. Meaningful variable names would be middle ground and not good at either, except to help someone not familiar with the subject to understand the equation, but a lot of the symbols are so abstract that they really need to be explained in more detail elsewhere or would be arbitrarily named.
If you're in an abstract/general mathematical function, then sure: single letters. If you're doing more business logic kind of stuff (iterating through a list of db/orm objects or processing a request body) then the names should be longer
Mathematics doesn't usually deal with databases or http requests
Often the actual meaning of the symbols is subordinate to the point you're trying to convey. e.g. I can tell you that `integrate(boundary(Region), form) = integrate(Region, differentiate(form))`, which is great and all, but I might write `<∂M|w> = <M|dw>` because what I'm trying to tell you is that you should think of these things as a dual-pairing of vector spaces (via integration) and that ∂ and d are somehow adjoint. They're both Stokes' theorem, but the emphasis is different, and in either case the hard part is the mountain of work it takes to define what the words even mean (limits, and integrals, and derivatives, and vectors, and covectors, and manifolds, and tangent spaces, and vector fields, and covector fields, and partitions of unity, and symmetric and alternating forms, and exterior derivatives, etc. etc. all so you can finally write one equation, which really just says that all the swirlies inside a region cancel out so if you want to add them all up, you can just add up the outer swirly).

The thing about math is you need to be comfortable viewing the same concept through a bunch of different lenses, and various notations are meant to help you do that by emphasizing different aspects of "the picture" you're looking at.

Ok, I can accept that. At the same time, my impression is that mathematicians always use single-letter variables.

It's like either they're not clear who their audience is or they're afraid to get off the beaten path. If they're explaining a classic algorithm, they use the common, single-letter variables instead of replacing them with meaningful names.

IMO your comment seems not to be addressing the point made in its parent comment. To make the point again with different words:

- Using long descriptive variable names would give them meaning, and make the particular equation/expression easier to understand or apply.

- Using short single-letter variable names allows you to forget the meaning of the variables and see the underlying structure, thus making the expression easier to connect to other situations (with completely unrelated meanings) that happen to have the same underlying structure. (The letters being meaningless, or at least not carrying their meaning so strongly, is a feature, not a bug.)

(See the highest-voted answer to https://math.stackexchange.com/questions/24241/why-do-mathem... for example.)

(Another way of seeing the distinction is whether you consider the equation to be the final result, to be used and applied, or as a starting point, to be manipulated further.)

Ok, that makes sense. Then maybe use single-letter variables for when working on something, and meaningful variable names for when publishing.

Edit: I realise, like someone mentioned in another comment, that sometimes you also want to make the pattern visible to readers.

yeah, that is why no one should use "i" and "j" in their loops. but instead choose "outerLoopIterator" and "innerLoopIterator" /s
You're looking for the theory of linear (or nonlinear) dynamical systems. Unfortunately it's not one kernel of intuition backed by consistent notation, it's many with no consistency. A good course on controls and signals/systems will beat those intuitions into you and you learn the math/parlance without getting attached to any one notational convention.

The real intuition is "everything is a filter." Everything else is about analysis and synthesis of that idea.

Maybe check out Probabilistic Robotics by Dieter Fox, Sebastian Thrun, and Wolfram Burgard. It has a coherent Bayesian formulation with consistent notation on many Kalman-related topics. Also with the rise of AI/ML, classic control theory ideas are being merged with reinforcement learning.
I agree that Bayesian filtering is the most general and logical approach. There are Bayesian derivations of the Kalman filter too.

Here is a broad survey: https://people.bordeaux.inria.fr/pierre.delmoral/chen_bayesi...

Thanks for the recommendation! It would never have occurred to me to look at robotics, but I can understand why that's very relevant.

I read Feedback Control for Computer Systems not too long ago, which felt like yet another restatement of the same ideas; I guess that counts as "classic control theory".

If Q and R are constant (as is usually the case), the gain quickly converges, such that the Kalman filter is just an exponential filter with a prediction step. For many people this is a lot easier to understand, and even matches how it is typically used, where Q and R are manually tuned until it “looks good” and never changed again. Moreover, there is just one gain to manually tune instead of multiple quantities Q and R.
Hey, I had very similar thoughts many years ago! The trick is yes, many filters boil down to alpha/beta, and the kalman filter is (edit: can be) really a way to generate those constants given a (linear) model (set of equations describing the dynamics, ie the future states) and good knowledge of the noise (variance) in the measurements. So if the measurements always have the same noise it will just reduce the constants over time, and it is only really useful when the measurement accuracy can be determined well and also changes a lot.
Interesting. Are you characterizing Kalman filters mostly as systems of control/refinement on top of alpha-beta filters?

I do feel like the core of it is essentially exponential/logarithmic growth/decay, with the option to layer multiple higher-order growth/decay series on top of one another. Maybe that's the gist...

Yeah, because a lot of times the equations that fall out of the KF look the same, only with variable values for alpha/beta.
When you start dealing with linear systems and disturbances, you end up with basically matrix math and covariance in some form and way.

The thing about Kalman filter is that its a pretty well known and exists in many software packages (just like PID) so its fairly easy to implement. But because noise is often not gaussian, and systems are often not linear, its more of a "works well enough" for most applications.

there is no better smoother than a future predictor. I'm not entirely sure what the issue is here.