Hacker News new | ask | show | jobs
by arp242 849 days ago
If the data is public or semi-public then chances are I consented to display that data there. I consented to it being used on that site, for the purpose of that site. Not for random other companies.

And most scraping isn't done by users. It's done by companies. For profit. Often for less than enlightened reasons.

LinkedIn is a good example: I want my data displayed to people on that job site. I don't want it harvested by every recruiter under the sun who will then spam me. I certainly don't want that data sold between those recruiters long after I deleted my account on LinkedIn. Tinder and sites like that are also an obvious example: yes it's (semi-)public, but I also wouldn't want it to be scraped and harvested by some company – I just want it to be shown temporarily to a limited set of people.

1 comments

In general, I don't think people should have a moral right to decide where and how the data that they made public is used, or to decide if it can get scraped or not.

And, in general, I take the fact that you published something on the Internet as a tacit moral consentment for the rest of the world to use it how they want.

This comes with a couple of big asterisks, because (1) Copyright law exists, and I generally try to not break the law, even if I don't agree with it. But the discussion in this thread is mostly separate from copyright: for instance, I don't think a court would see someone scraping and redistributing data from someone's LinkedIn profile as a copyright infringement case.

And (2) because I think that in some specific cases, using published data can be morally wrong, but not as a general rule.

i somewhat agree; people volunteer it when posting anything online. but they also volunteer their advertising id on their phones (even if they dont know it) - just as they dont know (and dont care) they are the product when on websites like facebook

i feel the 'antibot' stuff is more related to the adtech industry vs site-scrapers - remember getting a dedicated server and having friends click on links just to pay for it? Geocities and all these free websites, the biggest costs were bandwidth and storage (not that its not now)

since the AI Boom, there's just more hype over people wanting 'credit' (or money) for something they posted on a forum X-units of time ago.

its called the World Wide Web for a reason.. keep it open, even if it is to 'a bot' - never know when somebody's 'bot software' is reading your webpage for somebody who has some disadvantage and needs assistance