Hacker News new | ask | show | jobs
by siddhartb_ 2271 days ago
Code and Datasets we used are available at https://github.com/bhatiasiddharth/MIDAS
1 comments

Nice! I like there's a ready-to-use command line utility.

I think it'd benefit from a refactor to actually allow real-time streaming from stdin.

Do you have any hints how to choose timestamp units and how that affects parameters I should choose?

Nice suggestion. Will definitely try to refactor. Thanks!

In most of the cases, timestamps should be with the data itself (assuming its a dynamic graph). If timestamps are to be chosen, one can select in a way seeing how many edges usually come in one time tick (second/minute etc.)

Timestamps don't affect any parameters other than alpha (temporal decay factor). You may want to check out how to decay the contribution of the past edges in the anomalousness of the current edge. If there is lot of granularity in the timestamps, a smaller alpha should be chosen. Hope it helps.

Thank you for the explanation.

I'm looking forward to M-Stream for multi-dimensional data - but I have one question for that. Is there some preferred approach for selecting features in multi-dimensional anomaly detection?

Because I wonder if given enough dimensions, everything would be anomalous. Kind of like p-hacking works (at p=0.05 one of twenty hypotheses is falsely accepted just by sheer luck).

Interesting question. With an increase in dimensions, we consider the correlation between the features in addition to considering them individually. The work is currently under review. Feel free to get in touch and I can update you once we release the MStream work.