Neither title, comments here nor Github repo indicated to an outsider like me what this is or what field it's even from. The abstract of the linked paper helps though:
"Given a stream of graph edges from a dynamic graph, how can we assign anomaly scores to edges in an online manner, for the purpose of detecting unusual behavior, using constant time and memory? Existing approaches aim to detect individually surprising edges. In this work, we propose MIDAS, which focuses on detecting microcluster anomalies, or suddenly arriving groups of suspiciously similar edges, such as lockstep behavior, including denial of service attacks in network traffic data. MIDAS has the following properties: (a) it detects microcluster anomalies while providing theoretical guarantees about its false positive probability; (b) it is online, thus processing each edge in constant time and constant memory, and also processes the data 162−644 times faster than state-of-the-art approaches; (c) it provides 42%-48% higher accuracy (in terms of AUC) than state-of-the-art approaches."
This was in case someone wants to use the exact same API(for familiarity) in Go, they could. Posted this here to show that there is now a Go implementation of it. :)
After this, I'll be implementing a fit/predict api that mimics SKLearn's API and showing more examples on how to use it in a streaming fashion.
My last gig had some newly minted CS graduates who proposed complimenting our monitoring with some anomaly detection.
Having done a bit of both data mining and optimization, a lifetime ago, I could kinda follow the big data and machine learning stuff the kids were doing.
Whereas anomaly detection and fault prediction seem magical to me.
There is a recorded presentation of the paper at https://youtu.be/Bd4PyLCHrto
The first 5-10 minutes or so should be quite explanatory. Please feel free to let me know if you have any specific doubts. Thanks.
Detecting Intrusions, Denial of Service (DoS) attacks, Distributed Denial of Service (DDoS) attacks. It can also be used to detect fake profiles in Social Networks like Twitter, Facebook, Amazon reviews, and Financial Frauds. Basically any suspicious similar group of edges in time-evolving/dynamic graphs.
Could you go into the selection criteria for those 3 numbers? I see 2 and 769 on the original MIDAS repo as default number of hash functions and buckets, but without explanation of why they're chosen. The third argument, "m", I don't see explained anywhere.
Hi, I'm the author of the MIDAS algorithm. We choose the number of hash functions and bucket according to the maximum error we can tolerate and the false positive probability theoretical guarantee we want. Please refer to the AAAI paper here: https://www.comp.nus.edu.sg/~sbhatia/assets/pdf/midas.pdf
Let me know if you need more details.
"Given a stream of graph edges from a dynamic graph, how can we assign anomaly scores to edges in an online manner, for the purpose of detecting unusual behavior, using constant time and memory? Existing approaches aim to detect individually surprising edges. In this work, we propose MIDAS, which focuses on detecting microcluster anomalies, or suddenly arriving groups of suspiciously similar edges, such as lockstep behavior, including denial of service attacks in network traffic data. MIDAS has the following properties: (a) it detects microcluster anomalies while providing theoretical guarantees about its false positive probability; (b) it is online, thus processing each edge in constant time and constant memory, and also processes the data 162−644 times faster than state-of-the-art approaches; (c) it provides 42%-48% higher accuracy (in terms of AUC) than state-of-the-art approaches."