Hacker News new | ask | show | jobs
by NicoJuicy 1590 days ago
So, no source? Your response is unrelated to the statement at hand.

Think about it: Google has every advantage by respecting robots.txt and nothing to win by ignoring it.

Eg.

1) If a media company doesn't want to get crawled: add it in robots.txt

Then they realize their visitors drops and they'll remove it again.

Ergo: publishers sue. Because they want the advantages, but without the scraping. Which doesn't seem logical to me, since they currently give Google explicit permission to scrape content.

2) if they would sometimes leak personal documents protected by robots.txt they could have a lot of lawsuits on their hands.

Robots.txt is a simple method to not get blamed.

Ignoring robots.txt could literally be a core business liability from my POV.

---

So please, source outside of gut feeling, as requested before, would be greatly appreciated.

1 comments

My point is that they scrape the web for data because that is their core business.

Im not sure why robots.txt was even brought up.

So google respects this file? I say so what.

Im arguing that while Google has free reign to scrape whatever data it wants, we indie devs are subject to the cider house rules.

Sources can be found for just about any argument. So they are more or less useless.

There is nothing wrong with self evident truths or reasonable hypotheses. That is how the modern world was created.

A search engine that scrapes the web for data to make a good search engine. Who wouldve dreamed of it?

We are not privy to what happens behinds closed doors at Google. They only work for their shareholders. Not us or the public good.

Source that google does what it wants based on what it thinks the web should be. Google can change its mind on a whim https://www.searchenginejournal.com/google-robots-txt-noinde...