| HN Mirror

Y	Hacker News new \| ask \| show \| jobs


	by pilooch 3641 days ago
	Hey, just finished a malware ML custom system for one of the largest european corporations, large enough that some malware is targeted at them. Result is 97% accuracy (they did retrain and check on their own held out dataset). More careful analysis is needed (many malware have high entropy 'zones' that may help the classifier find the right category), but overall it does work. See the Microsoft / Kaggle challenge on classifying malware families, winning solution is > 99% accuracy IIRC.

2 comments

Eridrus 3641 days ago

Can you describe a security setting where 97% accuracy is actually useful? Unless the events you're looking at are low volume or you somehow have much more malicious data than everyone else that seems like a recipe for your results being primarily FPs.

link

lmeyerov 3640 days ago

For context, a company can easily get ~1B security-related events a day, so even reporting say 0.1% of those wrong a day means some poor junior analyst has 1,000,000 tickets to slog through. If you expand that to full packet captures as suggested in the article... ouch.

(We do some cool visual analytics work here, including unsupervised learning / classification, and target more of the problem of "given an incident you're already investigating, what else should you now look at from across all your tools?")

link

pilooch 3640 days ago

We're talking hundreds of thousands of malwares here.

link

Cybiote 3641 days ago

The 99% means little when it suffers from a similar sort of problem that the immune system has with cancer. Adversary's lack of stationarity vs a fixed model.

link

nickpsecurity 3641 days ago

That's what the research under the banner security via diversity and "moving target" are doing. I recall the Hydra firewall from Sentinel did that sort of thing. OpenBSD and grsecurity do in OSS for parts of their OS. Such methods can be combined with these.

Interesting name. Reminds me of a security scheme, Symbiotes, I briefly evaluated on Schneier's blog. Injected security into legacy, embedded applications with various tradeoffs. Where did you get the name from?

link