|
|
|
|
|
by fjabre
1595 days ago
|
|
You are stating that Google has never acted in bad faith and that robots.txt is the only thing that Google looks at when crawling/scraping the web. You’re a smart guy. Surely you must know how ridiculous that sounds on the face of it. It is common sense. The sky is blue. Source: Look up at the sky. |
|
Think about it: Google has every advantage by respecting robots.txt and nothing to win by ignoring it.
Eg.
1) If a media company doesn't want to get crawled: add it in robots.txt
Then they realize their visitors drops and they'll remove it again.
Ergo: publishers sue. Because they want the advantages, but without the scraping. Which doesn't seem logical to me, since they currently give Google explicit permission to scrape content.
2) if they would sometimes leak personal documents protected by robots.txt they could have a lot of lawsuits on their hands.
Robots.txt is a simple method to not get blamed.
Ignoring robots.txt could literally be a core business liability from my POV.
---
So please, source outside of gut feeling, as requested before, would be greatly appreciated.