Hacker News new | ask | show | jobs
by agdpf 2499 days ago
If you keep getting pwned by APTs and you need to build a WAF then the security of your codebase is, er, poor, to say the least.
2 comments

I never said they were on my network.

When I see patterns in traffic coming from 45,000 one-off hosts for a month straight it is clear that there is a distributed botnet behind the requests.

When I see a vulnerability scan from an Azure cloud instance seconds after I ban a block of Russian addresses, I can be sure there is coordination.

And don't get me started on the Moldavian Registration Bots. Those are a combination of automated and human-assisted CAPTCHA solvers, and it took me almost a full week of careful observation to weed them out.

These are some of the things my application firewall can detect automatically. Every now and then I see a new pattern, that is all I was trying to say.

This seems like the worst possible reason not to have a WAF.
Right, you got it. Sometimes I feel like we're being attacked by bespoke systems, but it really must be off-the-shelf stuff since our full content is easily licensable. We shouldn't be worth the trouble.

We just weren't getting enough information from Bing or Google Analytics or CloudFlare, and when I developed a realtime activity dashboard, patterns started emerging: distributed web scrapers, registration bots, vulnerability scans, and some of these in tandem (i.e., scans commencing immediately after blocking a block of addresses). And many of these are coming from cloud hosts, Azure being the worst, with Google a close second. This is the type of traffic they don't want you to see, so those respective analytics services just supress it because it would be a negative advertisement if we could actually see what is happening realtime. I compared the numbers - Google was consistently underreporting our traffic by at least 40%, and a lot (not the majority, but enough to be noticible) of that traffic was coming their own hosted servers (not the indexing bots, but the user cloud instances).

CloudFlare implements temporary bans but I needed something permanent for those threats that were recognizable based on their request patterns.

The ARIN squatting is the latest thing I'm seeing - a lot of requests coming from netblocks that are former DoD and RedHat addresses. The publicly available ARIN databases aren't entirely up-to-date and the bad guys know it, some of the checks we depend on have to be taken with a grain of salt.

So far, I've been able to develop business rules to separate out the human activity from the carefully constructed scraper/probe attempts, but I fear that if they get just a bit more sophisticated I may lose that ability.

Wouldn't distributed scrapers be blocking Google Analytics scripts by default? Or are you sending data there server-side?
That is a fair point. It used to be that way, but around 2016 we started noticing bots and scrapers that use that use full webkit implementations to run JavaScript. Those clients should be triggering Google Analytics just like a desktop browser, but it was difficult to make the correlation due to information hiding in the GA dashboard (tuple of IP address, timestamp, resource would be needed, but they do not provide that, so it was impossible to test).