Hacker News new | ask | show | jobs
by chasing 1091 days ago
Yes I do. I own the work I create, even if it's publicly available. I do get to decide what happens with it.
5 comments

> I do get to decide what happens with it.

No. Both legally and practically, you absolutely do not.

The only thing copyright law gives you is an exclusive right to sell it for a limited period of time, as a whole in its original form or similar -- and to transfer that right.

Regardless of your desires, anyone can reuse it under the conditions of fair use. They can copy parts of it for parody purposes. If they're not selling anything or taking away from your sales*, they can reproduce it verbatim for private purposes. And even if they are selling something, they can summarize it, quote from it, rephrase it, and so forth.

And you don't actually get to decide any of that.

* Edit: added "or..."

So you’re saying I’m right except in some narrowly carved-out situations. And I agree with you.
Nope. You said:

> I wasn't asked and I don't really care to donate work to large corporations like that... I do get to decide what happens with it.

And I said:

> No. Both legally and practically, you absolutely do not.

You think you get to decide whether large corporations can train on your work. I'm saying the the law suggests you very much don't get to decide that.

Read the comments you're replying to. I didn't comment on the legality of ChatGPT training on my content, I said I didn't like it. Regardless, the act of posting content publicly does not mean I give up my copyright claim. Yes, there are fair use situations. Training ChatGPT might be one of them, but I'm not seeing lot of concrete information one way or the other and I am seeing arguments that ChatGPT could be considered a derivative work, which would place OpenAI in violation of my copyright.

Send some links if you see some definitive case law sorting this stuff out.

You are claiming that piracy is legal.
Anyone can read your blog and then post their own blog post using knowledge they learned while reading yours. ChatGPT "learned" from your blog that same way
Since the way GPT "learns" is not materially similar to how a human learns, I don't see why this talking point is particularly relevant. Nothing stops the courts from distinguishing between an AI and a human with regard to what may be permissible.
I agree, it seems like all the arguments that the use of data by AI should have no more restrictions than the use of data by humans hinge on the implicit (or sometimes explicit) assumption that human learning and machine learning are identical. While there are parallels, there also seem to be significant differences not only in how the learning is done, but also in outcomes for the person whose data is being used. And since a major purpose of IP, copyright, etc. is at least ostensibly to protect the creators of information from negative outcomes, I don't think the outcomes can be ignored when comparing human learning to ML.
Anthropomorphizing that it "learned" is disingenuous and I expect better from the HN crowd.

If ChatGPT regurgitates verbatim or nearly verbatim, something it slurped up from OP's blog, is that not plagiarism? Where do you draw the line? Where would a reasonable person draw the line?

A human is both capable of reciting things from memory in an infringing manner, and learning from experiences to create something new. Maybe we should tape people's mouth shut if they dare to violate copyright by reciting a copyrighted book word for word or put them in a straight jacket if they recreate a copyrighted painting from memory.
Actually I fear that people that say this are doing worse than anthropomorphizing.

Often rather than claiming human aspects to the machine, they are going further, and claiming machine aspects to the human.

Using mechanistic analogies for explaining the human body or mind isn't new, but as machines become better and better at imitating humans, those analogies become more seductive.

That's my rant; the danger with 'AI' isn't so much that humans are enslaved by machines, but that we enslave each other -- or dehumanize each other -- with machines.

Like with everything in law, "intent" is paramount. Obviously it's not the trainer's, nor the end-user's goal to reproduce training set data verbatim; quite contrary, overfitting as such is undesirable.
Intent only goes so far. If I continually but unintentionally reproduce copyrighted works verbatim, I could still face consequences, particularly if I did not show due diligence in preventing it from happening in the first place.
But ChatGPT doesn’t spit out verbatim from the blog.
Computers aren't people. Software isn't humans.
There is a difference between learning from your work and copying your work.

You are entitled to control it's distribution and use. You are not entitled to control it's influence and effects.

I think you've made up an irrelevant argument. The work has been incorporated into a commercial product, intentionally, under the control of someone else. Software isn't humans that pay taxes, appear in court, have rights, etc.
No, the work has not been. The impression that the work leaves on a neural network has been though.

AIs are not massive repositories of harvested data. The models are relatively small (<20GB).

A resized, smaller, or encoded version of an image is still subject to copyright. Calling an encoding an 'impression' is deceitful.
Not always.

https://www.pinsentmasons.com/out-law/news/google-thumbnails...

> A US court ruled this week that Google's creation and display of thumbnail images does not infringe copyright. It also said that Google was not responsible for the copyright violations of other sites which it frames and links to.

Part of this ruling is about how the images are used -- Fair use -- not just that they were stored in a particular way. If Google was using the smaller versions of the images (thumbnails) in other ways, it could have been infringing.

> The Court said that Google did claim fair use, and that whether or not use was fair depended on four factors: the purpose and character of the use, including whether such use is of a commercial nature or is for non-profit educational purposes; the nature of the copyrighted work; the amount and substantiality of the portion used in relation to the copyrighted work as a whole; and the effect of the use upon the potential market for or value of the copyrighted work.

It's none of the those things, these models train on petabytes of data. They store relationships of objects to each other, not objects themselves.
Actually, people have been successfully sued for plagiarizing other works because they had internalized it and accidentally regurgitated it. So. The fact that content runs through a human brain doesn’t necessarily cleanse it from copyright concerns.
There is no "actually" because you are still addressing distribution. It wouldn't be hard to have another AI that analyzes outputs for copywriter infringement and culls them as necessary.

Would that satisfy you?

To some extent. Others can ingest your work, quote it, talk about it, criticize it, summarize, etc.
If I read your blog and used its data along with my own knowledge to create a course, would that be plagiarism or copyright violation?