Hacker News new | ask | show | jobs
by jeroenhd 616 days ago
Most of the good ones will tag themselves in the user agent and follow robots.txt.

The ones that don't are the ones people are trying to block the most. Sometimes Google or Bing go crazy and start scraping the same resource over and over again, but most scraping tools causing load peaks are the badly written/badly configured/malicious ones.

1 comments

Im thinking a lot of those issues might be related to “smart” scraping which parses JavaScript. Could lean in to the bot and just make it easier for them to scrape by removing JavaScript from the websites.

I realize this is somewhat off-topic, but the big companies kind of destroyed the internet with all the JavaScript frameworks and whatnot.