Hacker News new | ask | show | jobs
by fragmede 994 days ago
Bots and spam are an impossibly hard problem to crack. Google had to change the digital landscape of email in order to fight spam, and even then, the job is never finished.

The worst part though is knowing that legitimate users will get caught as collateral damage.

> How would we even know who's accessing HN unless they tell us?

My browser sends a cookie telling HN it's me. More advanced tooling would let you allow-list aged accounts with > 1000 karma in, while blocking a different subset. Of course, once that becomes known, then the attacking botnet will just use aged accounts with > 1000, so it's a game of cat a mouse.

What this really speaks to though is that HN has now garnered the attention of a sufficiently motivated attacker that more advanced technology is required to block them. Fighting it yourself takes away from time spent on moderation, among other things. Maybe it's one attacker and they'll get bored after their attempts prove fruitless, but maybe they won't. Either way, this is why Cloudflare's bot shield and others like it are so popular. A recaptcha in order to submit a comment wouldn't be the worst thing, though I'm sure there will be many loud shouty voices against it, but that's the unfortunately the nature of running any popular site on the Internet these days.

1 comments

> My browser sends a cookie telling HN it's me

Yes, that's what I mean: if people log in, then we know at least a bit about who's accessing the site. But the particular blocks I posted about above only apply to logged-out users. Logging in immunizes you from them immediately.

Or rather, presumably Hector Martin's connecting to HN via a logged in browser and experiencing the block, which shouldn't apply to logged-out users, so I'm guessing there a bug/disconnect somewhere (could be in my parsing of your original comment).
No one connecting via a logged-in browser would have been blocked by this code.

Edit: there are two exceptions—accounts we blocked because they were running crawlers that didn't respect HN's robots.txt—but both have been blocked for much longer than a few days.

In this post* Hector Martin makes a contradicting claim - that he's blocked using a logged in browser.

* https://social.treehouse.systems/@marcan/111165508206292497

Based on other posts in that thread though, he also appears to be behind CG-NAT, which is always a confounding factor for IP-based blocking. Maybe someone else on his netblock is running that crawler.

If someone wants to tell me the username, I'd be happy to look into what happened. Without the username, I don't know of any way to check this particular case—all I can say is what changed during those few days, and what changed is that we blocked more IPs that were making logged-out requests; logged-in requests would not have been affected.

Since that link refers to opening HN in an incognito window—and all those requests would be logged-out—most probably it was that activity that triggered the block. As I think I said elsewhere, it's hard to distinguish between a legit user accessing a bunch of HN links in various tabs, and a distributed botnet making similar handfuls of requests from a million different IP addresses.

What I can tell you for sure, though, is that the claim that we were targeting any individual user is quite false. Isn't that the main point?

Oh so a DDoS not a bot attack.