Hacker News new | ask | show | jobs
by yaakov34 1153 days ago
Can't pass by without making a public service announcement about the average speed example: don't compute integrals using the calculus definition of Sum(f(x_i) * delta). Look up quadrature or numerical integration methods instead.

Although it is true that you can approximate the average speed by taking an average of instantaneous speed measurements, that's usually a very bad way to do it in any real world situation. Numerical difference values are always noisier than the underlying quantity, sometimes to the point of being unusable, so of course if you can just read off the quantity you want directly (total difference over time), you should do that. But even if you can't, you should use a proper integration method instead of the calculus definition.

I have seen the Sum(f(x_i) * delta) calculation in a lot of real-world code. It has bad convergence properties, bad errors when the function has large derivatives, and bad performance when the data has noise. Some of the code I've seen produces garbage results, or has thousands of function evaluations when you need, like, four. "Quadrature? I think I heard that before, but I don't remember what it means."

In summary, please don't compute derivatives as (f(x_i+1)-f(x_i))/delta, or compute integrals as Sum(f(x_i) * delta), and especially, please don't do the first immediately followed by the second. Which also happens. Look up numerical methods instead.

This has been a public service announcement.

3 comments

Isn't it the case that other numerical integration methods only work if you have a f(x) that you can evaluate for any x you want (albeit possible costly)?

It seems to me that in many practical applications, the only thing you have to work with it is samples at discrete moments in time. It certainly seems to be the case here: "I would measure car's speed at every instant and produce an average of those measurements." We only know f(t_0), f(t_1), f(t_2), ... (and if we're lucky t_1-t_0 = t_2-t_1 = t_3-t_2 and so on); we have no way to compute things like f((t_0 + t_1)/2). In that case, how can we improve our calculation?

Even if you're limited to uniform sampling, something as simple as the trapezoid rule will give you quadratic convergence instead of linear for the naive Sum(f(t_i) * delta). In other words, error proportional to 1/n^2, instead of 1/n, where n is the number of samples, which is going to be a huge difference. There are many methods depending on the constraints of your problem - your ability to choose sampling intervals, knowledge of the bounds of your function or its derivatives, etc. The PSA is to study these things, instead of just writing the first thing that seems familiar from a long-ago calculus class.
Huh? Quadrature is a general term for "measuring area". In this context it's a synonym for integration.

I think you are trying to say that it's better to do weighted sums of fewer samples, instead of uniformly weighted Reimann sum. Both are "calculus definition" integration, of course, since calculus is true.

When mathematicians say Quadrature, they mean that if your function is suitably approximated by projecting onto some orthogonal basis functions, you can get very cheap approximations by cleverly expressing those integrals exactly as a linear combination of their values at certain points along the interval. You need very few.

https://en.m.wikipedia.org/wiki/Gaussian_quadrature

It is significantly more subtle than what you are thinking.

From that page:

> In numerical analysis, a quadrature rule is an approximation of the definite integral of a function, usually stated as a weighted sum of function values at specified points within the domain of integration

Simpson's rule, taught in first year calculus, is exact for cubics, with 3 sample points.

> please don't compute derivatives as (f(x_i+1)-f(x_i))/delta

Isn't this exactly what finite differences means? Sure, it's not optimal in all respects, but it's incredibly general and easy to remember.

This calculation amplifies any noise present in the values of the function, often to the point of the output being unusable. There are many methods that can be used to approximate derivatives, depending on the problem. Just as we shouldn't try to invent our cryptographic methods from scratch, we should take advantage of the extensive knowledge already in use for numeric methods.

I've seen naive numeric methods cause everything from jerky motion in video games to incorrect navigation data for cars.

So what would you suggest as a general calculation for finite differences, especially in those cases when only forward differentiation is possible, e.g. with respect to time?
The closest thing to a universal approach would be a Kalman filter. It's usually where you start when you have noisy measurements coming in, and you need to maintain state such as value and derivative.

Since the original question was about computing the velocity of a car, and since I work in the automotive field, let's take a real example: you want to know the approximate position, acceleration, and velocity (linear and angular) of your car. Your inputs are driven wheel speed (noisy, affected by wheelspin), non-driven wheel speed (noisy), accelerometer output (inaccurate, only present for some axes), GPS position (updated occasionally, has errors), and steering angle (pretty accurate, can be put into a chassis dynamics model). Almost certainly, you would use a Kalman filter to estimate the state of the car. Naive approaches such as subtracting two wheel speed values to obtain acceleration will not work well.

My point is that we should remember that numerical algorithms are a developed field with a lot of knowledge, and we should take advantage of the proven approaches. Sometimes, programmers who are not specifically from the physics or numerical fields, and who need to perform some computation, reach for a very simple approach such as the rectangle-rule integrals, and get bad results.

I see -- we are talking about two different things!

You are working on the problem of figuring out the hidden state based on noisy observations and a transition model.

I interpreted your statement much more broadly, so I was trying to discuss the problem of computing the explicit next state based on perfect knowledge of the transition model, in which case (y_1-y)/dt is a perfectly viable approach to estimate the derivative.

(You can do better by adding higher-order terms of course, but I haven't found that to be universally useful compared to making dt smaller.)