Hacker News new | ask | show | jobs
by defrost 702 days ago
> The lack of data for cat seizures is a challenge in this endeavour

Not so much as you might think.

The approach I'd take (from airborne geophysics) is to treat the datasets as "environmental normal" and pull out tens of thousand of (overlapping) 5 minute data runs and treat those as input vectors to an SVD (Singular Value Decomposition) reduction which becomes the kernal of monitoring going forward.

Next rig up the cat in question with accelorometers and record data just as was done in the prior datasets.

Your input now is a continuous pipeline (say every 20secs) of "the last five minutes of data" as a vector - reduce each vector to kernal (spanned by the basis for the "normal" dataset) + noise (doesn't match the normal span).

There will be a regular amout of "noise"; seizures and unusual behaviour should spike the amount of noise and deserve attention.

After a bit, you'll know what you're looking for (/cough /handwave /details).

This, more or less, is how "out of band" signal is found in 256 channel radiometric spectrometer surveys - primed with a back catalog of hours of regular boring survey data and trained to look for the abnormal.

1 comments

I agree that this is not a "big data" type problem, and that your approach of modelling the normal days is the way to go. Basically outlier/anomaly detection. But still, before on has 3-10 examples of the event being looked for - it is very hard to say something meaningful about how well it will work in practice. Getting over to the low data regime from the current zero data regime would help.
The insight from decades in exploration is "all anomalies are interesting"; the background drone matching helps to find the things of interest (until they're common and understood) - cats fighting, cats having sex, kittens stalk playing, etc. are all going to have differing (and overlapping) fingerprinting.

As with many projects of this nature there's very little more to say at this point in time until there's a pile of data to start wading through :)

Maybe someone will run with it.