|
From personal experience, it's quite the headache, even if you stay within legal parameters, you will run into site owners who are less than thrilled about what you're doing (possibly understandably so). I ran into several people who wrote cease and desists, which I honored, and into several others who started banning our IP addresses, etc, disallowing us specifically via robots.txt, etc.. There are obviously ways to get around these issues, but the main question is, morally, would you want to go around them? Are you willing to go against website owners who flat out don't want you scraping their data? Would you be willing to fight them legally for your right to do so? Ultimately, that's what it came down for me, I just felt really crappy about it and stopped. |
Personally, I feel that inclusion in Google constitutes public access to the data. As long as I'm not logged into an account on their system, I feel ethically justified about scraping their data.
In other words, I do not feel compelled to respect robots.txt if that file does not also block googlebot.
Legally it may be another issue, but ethically I consider inclusion in Google as an announcement that this information is public.