Hacker News new | ask | show | jobs
by Kodiack 134 days ago
If you have a "legitimate scraping pursuit", identify yourself appropriately that way. I'm happy to let most well-behaved scrapers access my content.

Hiding behind a residential proxy and using random user agents? Gross. Learn what consent is.

1 comments

try scraping any of the major players e.g. Amazon without residential proxy it won't work. I appreciate that you are offering to abide by crawling etiquette (e.g. robots.txt) but no major app supports that any more.

You're thinking about the case of big AI companies crawling your blog. I'm talking about a small startup trying to do traditional indexing and needing to run from residential proxy to make it work.