Hacker News new | ask | show | jobs
by john_strinlai 1 day ago
>Consent needs to be a core concept of it. If people don't want to use it, respect that opinion.

this was gone before chatgpt was even a twinkle in someone's eye.

"maybe later" replaced "no" on popups. automatically being opted-into mailing lists when ordering pizza or whatever (pizza hut is the worst). B2B emails that have size 3 font with a random word selected that i have to put in the subject line to unsubscribe from the spam. updates that turn on settings i have deliberately turned off. privacy policies changing on a whim that you "automatically accept by using the service" but logging in to delete your account counts as "using the service". etc.

there are a million+ examples of tech companies ignoring any concept of consent going back at least 20 years.

1 comments

Google started indexing copyrighted data without consent in 1999, Yahoo in 1994. Absolute delusion to think that ChatGPT is the one that broke consent.
I think there really is a fair difference between pure indexing and reoffering. I also don't think the way Google currently operates is still anywhere close to pure indexing - programs of theirs like amp and the news tab specifically deny sites visitors instead of their site serving as a visibility boost.

I am sure there are people who'd object to even being indexed but most niche communities were pretty rabid about getting more visibility to find more members.

There used to be a link next to every Google search result where you could view Google’s cached copy of that page. Also, Google News used to have a snippet from each article. You also used to be able to read a lot more of an indexed book on Google Books.

They’re gone now, presumably to appease copyright holders.

Also, YouTube was built on people uploading copies of commercial video.

> I think there really is a fair difference between pure indexing and reoffering.

Following the same logic, is there a fair difference between pure training and reoffering?

> I am sure there are people who'd object to even being indexed but most niche communities were pretty rabid about getting more visibility to find more members.

Here the logic seems to be: it's OK as long as they derive some kind of benefit from it to look past it.

> Following the same logic, is there a fair difference between pure training and reoffering?

This is the kind of hair-splitting that I was trying to avoid (because, at the end of the day, there is no functional difference, is md5 okay, maybe Markov chains, just a very simple one-layer perceptron?). Once you take someone's copyrighted work and you do anything with it without consent, you're breaking some implicit trust.

However, obviously there's a lot of tension here: free speech. transformed works, copyright owners, profit making, etc., etc. That's why I don't think it's really that important to exactly figure out what consent was broken and when, but rather it's important to be forward-looking and plan for what might come next.

> Once you take someone's copyrighted work and you do anything with it without consent, you're breaking some implicit trust.

While I agree there's a parallel, do consider what that trust is with regards to putting up an HTTP server. It's kind of like handing out flyers you made yourself. The server is yours and you're handing out your content on your own. Someone is going around accepting such flyers and putting them in their pocket (HTTP cache, maybe a browser's, maybe an indexer's). Then somebody asks them where they might find a barber, and they remember one such flyer was about barbers and they show them the flyer or part of it.

What implicit trust was broken? This is HTTP, the online equivalent of handing out flyers.

Part of the problem here is that copyright is quite a broken concept. That's why it's got such big wiggle room as "fair use" and such.

Funny example, because if you create a flyer, you own the copyright to said flyer :) So if you create a flyer, then if someone else uses that flyer to make money, you can sue them and you will win in court (unless the derived work is transformative, critiques it, yadda yadda). And this is the kind of hair-splitting that can get you into trouble, because I think it's trivial that ChatGPT's training is certainly more transformative than Google's indexing/PageRank, but we're somehow more upset at the latter than we are at the former.
I was defending your point. Google's indexing was a valid parallel.
I saw, I'm just re-emphasizing/clarifying.
How does copyright maximalism promote the progress of science and the useful arts?

The pendulum has swung so high that it's going to break on the return.

Hollywood circumventing the patent for film making. Goes back a fair way. Be funded by the money and break laws seems to be the paradigm.