Hacker News new | ask | show | jobs
by dvt 1 day ago
Google started indexing copyrighted data without consent in 1999, Yahoo in 1994. Absolute delusion to think that ChatGPT is the one that broke consent.
2 comments

I think there really is a fair difference between pure indexing and reoffering. I also don't think the way Google currently operates is still anywhere close to pure indexing - programs of theirs like amp and the news tab specifically deny sites visitors instead of their site serving as a visibility boost.

I am sure there are people who'd object to even being indexed but most niche communities were pretty rabid about getting more visibility to find more members.

There used to be a link next to every Google search result where you could view Google’s cached copy of that page. Also, Google News used to have a snippet from each article. You also used to be able to read a lot more of an indexed book on Google Books.

They’re gone now, presumably to appease copyright holders.

Also, YouTube was built on people uploading copies of commercial video.

> I think there really is a fair difference between pure indexing and reoffering.

Following the same logic, is there a fair difference between pure training and reoffering?

> I am sure there are people who'd object to even being indexed but most niche communities were pretty rabid about getting more visibility to find more members.

Here the logic seems to be: it's OK as long as they derive some kind of benefit from it to look past it.

> Following the same logic, is there a fair difference between pure training and reoffering?

This is the kind of hair-splitting that I was trying to avoid (because, at the end of the day, there is no functional difference, is md5 okay, maybe Markov chains, just a very simple one-layer perceptron?). Once you take someone's copyrighted work and you do anything with it without consent, you're breaking some implicit trust.

However, obviously there's a lot of tension here: free speech. transformed works, copyright owners, profit making, etc., etc. That's why I don't think it's really that important to exactly figure out what consent was broken and when, but rather it's important to be forward-looking and plan for what might come next.

> Once you take someone's copyrighted work and you do anything with it without consent, you're breaking some implicit trust.

While I agree there's a parallel, do consider what that trust is with regards to putting up an HTTP server. It's kind of like handing out flyers you made yourself. The server is yours and you're handing out your content on your own. Someone is going around accepting such flyers and putting them in their pocket (HTTP cache, maybe a browser's, maybe an indexer's). Then somebody asks them where they might find a barber, and they remember one such flyer was about barbers and they show them the flyer or part of it.

What implicit trust was broken? This is HTTP, the online equivalent of handing out flyers.

Part of the problem here is that copyright is quite a broken concept. That's why it's got such big wiggle room as "fair use" and such.

Funny example, because if you create a flyer, you own the copyright to said flyer :) So if you create a flyer, then if someone else uses that flyer to make money, you can sue them and you will win in court (unless the derived work is transformative, critiques it, yadda yadda). And this is the kind of hair-splitting that can get you into trouble, because I think it's trivial that ChatGPT's training is certainly more transformative than Google's indexing/PageRank, but we're somehow more upset at the latter than we are at the former.
> then if someone else uses that flyer to make money, you can sue them and you will win in court (unless the derived work is transformative, critiques it, yadda yadda).

Not a lawyer, but 1) consider that sharing flyers is the normal use of the thing, and it's precisely what the business wants, to have their marketing spread to whoever needs them. 2) Doubt there's a law that whatever permission you implicitly granted by personally handing your work is revoked as soon as they find a way to make money from it. 3) This is kind of Groupon's business model: to group coupons (which business flyers also typically are). It benefited the businesses and I don't think Groupon needed to pay the businesses for a copyright license to do them the favor of furthering their marketing. Rather Groupon got better deals from the businesses because Groupon had better reach. They had better reach because people that wanted coupons to save money could just buy the little booklet instead of driving around collecting one or two at different locations.

I was defending your point. Google's indexing was a valid parallel.
I saw, I'm just re-emphasizing/clarifying.
How does copyright maximalism promote the progress of science and the useful arts?

The pendulum has swung so high that it's going to break on the return.

Hollywood circumventing the patent for film making. Goes back a fair way. Be funded by the money and break laws seems to be the paradigm.