Hacker News new | ask | show | jobs
by reefoctopus 2553 days ago
Teach your students to ensure there’s a delay between requests so they aren’t hammering anyone’s server, and follow the rules in the robots.txt. I’ve scraped more than a billion pages without any issues.
3 comments

He asked about legality, not technical difficulty.
Actually many of the students are technically competent to do the scrpaing mostly using Python and I am pretty sure they learned not to overwhelm web servers.
Just because it’s technically feasible does not mean it’s legal or ethical.
I'd say the act of web scraping alone, is almost never unethical if you are careful to not cause undue load to servers. From an ethics, not legal perspective, I don't see a whole lot of difference between your computer's silicon eyes and your organic eyes just looking at something that's already in plain view.

It might be illegal in some jurisdiction; IANAL but I think you can just get out of that jurisdiction and scrape away if that is the case. It might violate some ToS but ToS isn't law; the consequences of violating a ToS are usually on the order of getting your IP banned.

What you do with the stuff you scraped can be ethical or unethical.

What makes it unethical?

Why should I be treated differently than search engine spiders?

If somebody doesn’t want their site scraped then they can let people know with robots.txt. Get off your high horse.

They never said it was unethical.
Likewise, just because its not legal or in some perspective its unethical, doesn't mean one should not do it.