|
|
|
|
|
by r_singh
225 days ago
|
|
It’s fair to be angry at abuse and "aggressive bots", but it's important to remember most large platforms—including the ones being scraped—built their own products on scraping too. I run an e-commerce-specific scraping API that helps developers access SERP, PDP, and reviews data. I've noticed the web already has unsaid balances: certain traffic patterns and techniques are tolerated, others clearly aren’t. Most sites handle reasonable, well-behaved crawlers just fine. Platforms claim ownership of UGC and public data through dark patterns and narrative control. The current guidelines are a result of supplier convenience, and there are several cases where absolutely fundamental web services run by the largest companies in the world themselves breach those guidelines (including those funded by the fund running this site). We need standards that treat public data as a shared resource with predictable, ethical access for everyone, not just for those with scale or lobbying power. |
|
Not everyone has the budget for unlimited bandwidth and compute, and in several of my clients’ cases that’s been >95% of all traffic.
People running these bots with AI/VC capital are just script kiddies that forgot that not every site is a boatload of app servers behind Cloudflare.