| So I can't speak for twitter, but I work on anti-spam at Facebook, and imagine the problems we face are relatively similar. It's worth noting that there's a constant barrage of people trying to send varying degrees of spam. It's not like there's An Attack all of a Sudden - just occasionally people close to the HN social network happen to be targeted by something and it's magnified by the media / hive mind local to us. > shouldn't Twitter be able to pick these messages up automatically fairly fast Theoretically, sure. As a human looking at an attack, it's usually pretty easy to pick out "obvious" attributes that they should have been able to catch. But when you're operating at a scale like us or Twitter, even stuff that looks like it's obviously-indicative-of-badness often has false-positives (posts flagged as spam that are not). The long tail of weird stuff that a billion users do can be pretty crazy. At the same time, the "obvious" attributes of an attack are often very cheap for an attacker to change. Instead, we try to go after more expensive resources (domains, source IPs, etc). > after (I assume) hundreds if not thousands of users have flagged them Sadly, looking at flags of content is not a silver bullet. The signal is very sparse (a given spam post is rarely flagged), and nonspam posts are frequently flagged (religious and political speech are great examples - and they are the worst kind of false positive if you delete them as spam). These problems can be somewhat mitigated if you aggregate flags over a dimension that's expensive for the attacker (domain-posted, IP that posted the content, text shingles), but even then the recall isn't necessarily great and you could still catch e.g. controversial political domains. > the spammers can't have unlimited IPs True, though you can rent space on a botnet that has many, geographically-diverse, real-user IPs. Also, I imagine a significant chunk of posts to Twitter come from apps, many of which each use a single IP to post tons of content. > Is there a reason the same techniques used in E-Mail aren't applicable to Twitter? There's definitely some overlap. I'm not an expert at email anti-spam, but in general it's a relatively different problem. "Traditional" email spam is sent from some random email address on / via a compromised machine or open relay, and seems to be a relatively-well-solved. But it sounds like this twitter attack was caused by compromised accounts. At least anecdotally, it seems that email vendors are also not great at detecting this kind of attack. For example, my gmail account (with arguably the best spam protection in the industry?) gets a message every few weeks from some compromised friend's account. (i.e. someone had their email password stolen and the attacker is using it to "legitimately" send mail after authenticating to that email service with the correct password). |