Hacker News new | ask | show | jobs
by hkr_mag 3608 days ago
Thanks for feedback!

1. Customers analyze traffic with locally installed NGINX-based instances (there is not DNS take-over). They send applications/traffic statistics to Wallarm Cloud so we can run machine-learning stuff. We had a lot of work done for initial training of the system using our own experience in web app security (more than 250+ pentests for top-tier companies + a lot of researches done by our team like SSRF bible). We also use different honeypots and now statistics of customers with a high volume traffic.

2. There are some details about ML technique covered by Ivan for another comment

3. We have different tasks with SiftScience. SiftScience provides a fraud-detection. Wallarm protects web apps and APIs against data breaches. But these tasks are related for some of our customers.

1 comments

Thanks for the answers.

I have clarifying questions:

How much of your machine learning is used for understanding the application (as Ivan said elsewhere, clustering login functionality together), and how much is actually used for fingerprinting vulnerability identification attempts on the part of user input?

To place this in a broader context, you do not need machine learning for identifying many cases of malicious user input, you can rely on simple heuristics. There is likely no reason for a user to submit `<script>alert(1);</script`, which is an obvious test for XSS low hanging fruit. Any good WAF will do this.

Given that, does Wallarm use mostly heuristics for identifying malicious user input, or does it also combine machine learning into this process at all to find non-obvious input patterns that could be indicative of penetration testing attempts?

Our attack type recognition based on machine learning which can at first produce lexems and, secondly. syntax constructions (patterns) by existing attacks. For example, in the case of memcached injections (more details: https://www.blackhat.com/docs/us-14/materials/us-14-Novikov-...) we can train system to detect these attacks without regexps or new heuristic rules.