Hacker News new | ask | show | jobs
by ebtalley 5067 days ago
data would be * IP addresses, one can assume a bot would only have a set of addresses they could use, barring botnets. * request patterns, ie: did the bot request css/js, etc * request timeframes * UA strings

Sure, its a big data problem, but I can imagine that Facebook has solved these types of scenarios many times over.

1 comments

What if you start a new Amazon EC2 spot instance (netting you a new IP address), start up Chromium in headless mode (say, using Xvfb), navigate to the website of choice, use mouse automation to start clicking around, click the ad, spend 5 minutes clicking around in a semi-choreographed pattern on the advertisee's website, and then shut down the instance -- only to repeat?

Moreover, Amazon is always buying new IP subnets.

It sounds like you don't need to go to that much hassle currently, but even that rigmarole is simple enough to combat. The user account should be real, the usage real (comments, photos, messages back and forth) and the friends also real. False positive spam Ids are OK, that will lower your revenue but won't constitute fraud with your customers. Put up a test for uses you think are spamming, the test they already do of identifying photos of your friends would be a good one.

Large numbers of real looking fake accounts should be hard to keep up.

The user from that new IP won't have any real human like history - photos shared and commented on over time etc.
But why would you go through all that to click on ads? If I click on an ad for "Some Record Company" how does that make me money?
It doesn't always need to make you money, sometimes you might just want it to cost your competitor money.
I dont count clicks from any amazonaws or ec2 hostnames on my site.
Then pay Amazon for a list of their EC2 IPs, or obtain that information from a public source (i.e. RIPE, ARIN).