Hacker News new | ask | show | jobs
by pilooch 3641 days ago
Hey, just finished a malware ML custom system for one of the largest european corporations, large enough that some malware is targeted at them. Result is 97% accuracy (they did retrain and check on their own held out dataset). More careful analysis is needed (many malware have high entropy 'zones' that may help the classifier find the right category), but overall it does work.

See the Microsoft / Kaggle challenge on classifying malware families, winning solution is > 99% accuracy IIRC.

2 comments

Can you describe a security setting where 97% accuracy is actually useful? Unless the events you're looking at are low volume or you somehow have much more malicious data than everyone else that seems like a recipe for your results being primarily FPs.
For context, a company can easily get ~1B security-related events a day, so even reporting say 0.1% of those wrong a day means some poor junior analyst has 1,000,000 tickets to slog through. If you expand that to full packet captures as suggested in the article... ouch.

(We do some cool visual analytics work here, including unsupervised learning / classification, and target more of the problem of "given an incident you're already investigating, what else should you now look at from across all your tools?")

We're talking hundreds of thousands of malwares here.
The 99% means little when it suffers from a similar sort of problem that the immune system has with cancer. Adversary's lack of stationarity vs a fixed model.
That's what the research under the banner security via diversity and "moving target" are doing. I recall the Hydra firewall from Sentinel did that sort of thing. OpenBSD and grsecurity do in OSS for parts of their OS. Such methods can be combined with these.

Interesting name. Reminds me of a security scheme, Symbiotes, I briefly evaluated on Schneier's blog. Injected security into legacy, embedded applications with various tradeoffs. Where did you get the name from?