Hacker News new | ask | show | jobs
by defrost 607 days ago
I've been a heavy user of Savitzky-Golay filters (linear time series, rectangular grid images, cubic space domains | first, second and third derivitives | balanced and unbalanced (returning central region smoothed values and values at edges)) since the 1980s.

The usual implementation is as a convolution filter based on the premise that the underlying data is regularly sampled.

The pain in the arse occassional reality is missing data and|or present but glitched|spiked data .. both of which require a "sensible infill" to continue with a convolution.

This is a nice implementation and a potentially useful bit of kit- the elephant in the room (from my PoV) is "how come the application domain is irregularly sampled data"?

Generally (engineering, geophysics, etc) great lengths are taken to clock data samples like a metronome (in time and|or space (as required most)).

I'm assuming that your gridded GeoTIFF data field is regularly sampled in both the X and Y axis?

2 comments

Yup, my data is nicely gridded so I can use the convolution approach pretty easily. Agreed though - missing data at the edges or in the interior is annoying. For a while I was thinking I should recompute the SG coefficients every time I hit a missing data point so that they just "jump over" the missing values, giving me a derivative at the missing point based on the values that come before and after it, but for now I'm just throwing away any convolutions that hit a missing value.
> For a while I was thinking I should recompute the SG coefficients every time

We had, in our geophysics application, a "pre computed" coefficient cache - the primary filters (central symmetric smoothing at various lengths) were common choices and almost always there to grab - missing values were either cheaply "faked" for Quick'NDirty displays or infilled by prediction filters that were S-G's computed to use existing points within the range to replace the missing value, that was either a look up from indexed filter cache or a fresh filter generation to use and stash in cache.

It's a complication (in the mechanical watch sense) to add, but with code to generate coefficients already existing it's really just looking at the generation times versus the hassle of indexing and storing them as created and the frequency of reuse of "uncommon" patterns.

Yeah regularly sampled is the goal almost always, and great when its available! The main times I deal with non-uniformly sampled data is with jitter and missing data etc