Hacker News new | ask | show | jobs
by PolandKid 1632 days ago
@AwkwardPanda

And how does a site opt out of your scraping? Do you have a unique user-agent when you scrape? A set of IPs?

2 comments

Hi, currently there is nothing of this sort. The user agents are random. I have a couple of servers doing the scraping in real-time. The IPs are not static.

Let me see if I can build a opt-out list. But wouldn't it beat the entire purpose of this app?

You should just declare your bot as a user-agent. Most publishers won't even bother to do it, but leaving a publisher an option is the correct etiquette for any bot. Random user agents is cloaked scraping.
You know the answer to this.