Hacker News new | ask | show | jobs
by CobrastanJorji 3343 days ago
I feel like it's a good thing to maintain a certain level of professional ethics, and, while it depends on the specifics of the situation, I'd suggest that falsely claiming to third parties be something you aren't in order to do something they don't want you to do generally falls short of that ethical bar.

Say your bot misbehaves and effectively starts DOSing a site with a whole lot of pages, like a small Reddit clone or something. And say Reddit doesn't have another way to determine between your bot and the Googlebot. You have now put Reddit in a position where they have to either block the Googlebot (and possibly lose a huge pile of money in the process) or else buy up a lot more hardware and bandwidth to pay for your crawler as well. That's not cool, to put it bluntly.

2 comments

Not to detract from your point, but blindly blocking a User Agent because of a bad actor and losing money is not a good solution.

A more robust solution can be coded using information from Google: https://support.google.com/webmasters/answer/80553?hl=en

I would hope the people at Reddit are smart enough to check an IP and not just a user-agent.