| HN Mirror

Y	Hacker News new \| ask \| show \| jobs

by _lm_ 3518 days ago

> To avoid extracting irrelevant features, the TSFRESH package has a built-in filtering procedure. This filtering procedure evaluates the explaining power and importance of each characteristic for the regression or classification tasks at hand.

> It is based on the well developed theory of hypothesis testing and uses a multiple test procedure. As a result the filtering process mathematically controls the percentage of irrelevant extracted features.

Here's the paper on this: https://arxiv.org/abs/1610.07717

It seems that the relevance of the features is somewhat tunable based on the p-value you choose for the statistical tests. (Every feature selection algorithm I can think of has some tunable parameter, although the information theoretic ones just depend on the length of features you're willing to consider.)

1 comments

MaxBenChrist 3517 days ago

The individual feature significance tests do not have any parameter, they just generate the p-values.

The only parameter that one can tune is the overall percentage of irrelevant extracted features. That is the expected FDR of the Benjamini yakutieli procedure.

link