|
|
|
|
|
by kelnos
228 days ago
|
|
> robots.txt is a polite request to please not scrape these pages People who ignore polite requests are assholes, and we are well within our rights to complain about them. I agree that "theft" is too strong (though I think you might be presenting a straw man there), but "abuse" can be perfectly apt: a crawler hammering a server, requesting the same pages over and over, absolutely is abuse. > Likewise if enforcing a rule of no scraping is of utmost importance you need to require an API token or some other form of authentication before you serve the pages. That's a shitty world that we shouldn't have to live in. |
|
If you are building a new search engine and the robots.txt only include Google, are you an asshole indexing the information?