Hacker News new | ask | show | jobs
by MauranKilom 2244 days ago
Neither title, comments here nor Github repo indicated to an outsider like me what this is or what field it's even from. The abstract of the linked paper helps though:

"Given a stream of graph edges from a dynamic graph, how can we assign anomaly scores to edges in an online manner, for the purpose of detecting unusual behavior, using constant time and memory? Existing approaches aim to detect individually surprising edges. In this work, we propose MIDAS, which focuses on detecting microcluster anomalies, or suddenly arriving groups of suspiciously similar edges, such as lockstep behavior, including denial of service attacks in network traffic data. MIDAS has the following properties: (a) it detects microcluster anomalies while providing theoretical guarantees about its false positive probability; (b) it is online, thus processing each edge in constant time and constant memory, and also processes the data 162−644 times faster than state-of-the-art approaches; (c) it provides 42%-48% higher accuracy (in terms of AUC) than state-of-the-art approaches."

2 comments

Thanks for your comments.

I know the documentation and examples is still a bit raw, will be working on it.

When doing the implementation, I was trying to mimic the original API(https://github.com/bhatiasiddharth/MIDAS) that was done in the C++ first.

This was in case someone wants to use the exact same API(for familiarity) in Go, they could. Posted this here to show that there is now a Go implementation of it. :)

After this, I'll be implementing a fit/predict api that mimics SKLearn's API and showing more examples on how to use it in a streaming fashion.

Thanks. I'd love some ELI5s on this.

My last gig had some newly minted CS graduates who proposed complimenting our monitoring with some anomaly detection.

Having done a bit of both data mining and optimization, a lifetime ago, I could kinda follow the big data and machine learning stuff the kids were doing.

Whereas anomaly detection and fault prediction seem magical to me.

There is a recorded presentation of the paper at https://youtu.be/Bd4PyLCHrto The first 5-10 minutes or so should be quite explanatory. Please feel free to let me know if you have any specific doubts. Thanks.