Hacker News new | ask | show | jobs
by bediger4000 4834 days ago
I've een wondering about this sort of thing (overwhelming a bad actor with bogus responses) lately. I have in place PHP scripts that send Yandex and Ahrefs and Cyveillance a semi-random HTML file in respose to any request. Those semi-random HTML files just lead Yandex, Cyveillance and other bad actors down a never-ending rabbit hole of URLs that serve up more semi-random content.

What if some significant fraction of web servers did this? Wouldn't that make trolling for "IP theft" like Cyveillance does into an economically unfeasible activity?

What if nearly everyone pressed 1 when "Ann from Account Services" or "Rachel from Cardmember Services" calls, and then talked to the service rep for as long as possible?

4 comments

The only problem that I can see is that "overwhelming a bad actor with bogus responses" is a subset of "overwhelming an actor with bogus responses". If this automated technique gets pointed at a legitimate business through error, malice or trickery (as per http://en.wikipedia.org/wiki/Swatting ) then that would be rather bad.
I agree. But it takes a large number of individual web site administrators to get upset enough to configure HTTP servers to send bogus responses to overwhelm. That's the idea's greatest problem, and the factor that keeps the idea from being employed on legit businesses.

In the case of web servers, the bad actors like Ahrefs often ask for things vaguely like known security problems - issues in PHP based BBS for example. Ahrefs asks for something and they get some data back. Is it my fault that they don't get back data with the exact semantics they wanted? No, as I am not a magician.

We do a similar thing when (annoying) sales people call. You can try to get rid of them as soon as possible. But it is more annoying to them to keep them occupied as long as possible. When I'm disturbed during dinner by ANOTHER newspaper sales call I'll first politely try to say I'm not interested. If they don't take the hit I change tactics and say 'You know, That DOES sound interesting. Let me talk to my wife for a moment' and put the phone down and go back to dinner. 10 minutes later you can just hang up the phone.
Why have you mixed Yandex into “bad actors”?
For years, they requested files from my web site. I got single digit referrals from Yandex over those same years. So, I went to yandex.com and looked up some of the things my web site has info on (combinatory logic, for example). I got really spammy and scammy links from yandex on those subjects, and others that I've tried.

I just used my own judgement on it.

Sometimes Google gives spammy links as ansewrs too. So why don’t block it altogether?
The key here is "sometimes" versus "all". In my estimation, Yandex gave nothing but spammy or scammy looking links, and certainly nothing worthwhile. So, I decided to futz with them.
Why are Cyveillance a "bad actor"?
They never ask for "robots.txt", and then they download your entire site every month, for starters. Further, they lie about who they are. They send a User Agent string that doesn't reflect that it's a bot doing the downloading. The User Agent string claims to be Internet Explorer on a Windows box, yet p0f recognizes the requests as from a Linux TCP/IP stack.

Trolling for "intellectual property" infringement for third parties also seems like a scummy line of work to me. It's in Cyveillance interest to find infringements, so there's no economic reason for them to get such findings correct.

So, I conclude they're a bad actor.

I had never heard of cyveillance, but after googling I'm guessing these reasons: http://en.wikipedia.org/wiki/Cyveillance#Criticisms