Hacker News new | ask | show | jobs
by zexodus 843 days ago
Every single time I see these scraping discussions I get the same thoughts:

Businesses use data from the user. The Business does additional crunching on that data to derive new interesting data for the user. Who owns the data? The user or the app?

At the very least the user partially owns the data and as such, I'd argue that the user should have the right to share the data between different applications however they see fit. However, businesses tend to think that they somehow have the legal (moral even?) right to keep that data in their walled gardens. For as long as this (imo unfair) stance is common, I think that data extraction by use of these anti-bot-bypassing technologies is fair game.

2 comments

If the data is public or semi-public then chances are I consented to display that data there. I consented to it being used on that site, for the purpose of that site. Not for random other companies.

And most scraping isn't done by users. It's done by companies. For profit. Often for less than enlightened reasons.

LinkedIn is a good example: I want my data displayed to people on that job site. I don't want it harvested by every recruiter under the sun who will then spam me. I certainly don't want that data sold between those recruiters long after I deleted my account on LinkedIn. Tinder and sites like that are also an obvious example: yes it's (semi-)public, but I also wouldn't want it to be scraped and harvested by some company – I just want it to be shown temporarily to a limited set of people.

In general, I don't think people should have a moral right to decide where and how the data that they made public is used, or to decide if it can get scraped or not.

And, in general, I take the fact that you published something on the Internet as a tacit moral consentment for the rest of the world to use it how they want.

This comes with a couple of big asterisks, because (1) Copyright law exists, and I generally try to not break the law, even if I don't agree with it. But the discussion in this thread is mostly separate from copyright: for instance, I don't think a court would see someone scraping and redistributing data from someone's LinkedIn profile as a copyright infringement case.

And (2) because I think that in some specific cases, using published data can be morally wrong, but not as a general rule.

i somewhat agree; people volunteer it when posting anything online. but they also volunteer their advertising id on their phones (even if they dont know it) - just as they dont know (and dont care) they are the product when on websites like facebook

i feel the 'antibot' stuff is more related to the adtech industry vs site-scrapers - remember getting a dedicated server and having friends click on links just to pay for it? Geocities and all these free websites, the biggest costs were bandwidth and storage (not that its not now)

since the AI Boom, there's just more hype over people wanting 'credit' (or money) for something they posted on a forum X-units of time ago.

its called the World Wide Web for a reason.. keep it open, even if it is to 'a bot' - never know when somebody's 'bot software' is reading your webpage for somebody who has some disadvantage and needs assistance

the growing IP centipede is an interesting GIGO dilemma