Hacker News new | ask | show | jobs
by blantonl 783 days ago
I don't care if you use Web scraping to solve the Israeli / Palestinian conflict. You're not entitled to anyone's data, computers, services, etc because you've decided for altruistic reasons that it is appropriate.

Cool use case. Love it. Fascinating stuff. But if Google told you to stop, would you? Or would you instead decide to build a 5 server cluster of 200 4G modems spread across continents to continue your work? Because if you did I would assume that you've decided to move on from a cute little altruistic process into a commercial use of someone else's data to make a profit.

4 comments

Wait - so you are saying that information on the public internet isn’t public? Man, I wish people would remember the origin of the web and the entire reason it exists. If you don’t want information public, protect it - otherwise, I say it’s fair game.
Remember the OP article is about a system that is designed to completely and directly circumvent protections.

If an organization puts a series of processes in place to prevent scrapers from wholesale taking data in violation of terms of service, and you develop a 5 server cluster of 200x 4G modems it's no longer "fair game" and you're directly being unethical in your use of someone else's services.

Yeah, I think it's fair to say that in the presence of anti-bot measures (whether they work or not) that the content on the website isn't public anymore.

Available to someone meeting certain criteria (student discount, senior discount) doesn't mean available to anyone. I see no reason that "not available to be consumed by autonomous agents" is somehow invalid in a way that unlimited refills is only available to humans and not robots.

I agree that there is a line at using someone else’s data to make a profit, but it is kind of ironic that you mention Google, because their exact business model is scraping websites to feed their search results and litter it with ads to make a profit. For me there is a big line between aggregating publicly available data (search results, reviews, news, job postings, etc. ) and intentionally violating terms of service like signing up for fake accounts an harvesting user data. So entitled maybe not (sites can try to prevent you from scraping), but if you make something publicly available you shouldn’t be surprised when people use it in ways you may not originally have intended (within legal boundaries of course).
>I don't care if you use Web scraping to solve the Israeli / Palestinian conflict.

Maybe you should though. It's always worth it to think about which giant's shoulder you're standing on. It's giants all the way down.

> cute little altruistic process

Maybe it is not the opinion which is unpopular, but the way it is being presented.