Hacker News new | ask | show | jobs
by taftster 3349 days ago
If you crawl a site, index it, and then use that for commercial purposes -- all while using Google's trademark to crawl -- yes, you'll probably get a letter from Google.

As for the site owner, it's on them to decide what to do with your traffic. HTTP is an open protocol and extensible. You could send almost anything in your request, as allowed by the protocol. The site owner has opened their service to the HTTP protocol and it's on them to decide what to do with your traffic.

1 comments

How would you get a letter from Google if you are never scraping google's sites? They would never know?
I was trying to say, if for example, you were creating a competitive search engine to Google, but using Google's name to build that service, you'd be in trouble.

How would Google know? They would start by setting up fictitious websites which would be seemingly unaffiliated to them. If your crawler was to hit the site, you would thus reveal yourself. I wouldn't at all be surprised that Google would have this kind of "honey pot" of sorts sitting out there watching for web crawlers (rogue or otherwise).

Google likely also has business partner relationships with big content producers, which I'm sure they are able to get reports back from regarding their crawling -- to ensure that Google is correctly finding all the content which the site owners want them to.

As an aside, I used to run such a honeypot website. Web crawler behavior is fascinating. I loved being able to find, detect and classify various forms of web crawlers. Some which followed robots.txt, some that didn't, some that went directly to robots.txt and then scraped the pages which were meant to be excluded. I wish I had kept the project going and formalized the results.

Even if they do know, what do they have to do with it? Does google have a legal claim to their user-agent string exclusively?
If the name "googlebot" is trademarked, yes they would have a basis for a claim. It would at least be leverage they could use if they believed you were causing them harm in some way.