In a discussion, one should strive not just to stick to saying things that are true, but to make sure that what they're saying is both true and relevant.
People should obtain their learning material legally. True statement (at least as "true" as any opinionated value judgement can be, I guess). The question is how, though, is that statement relevant here?
Even by the NYT's own telling (the way that OpenAI obtained their articles was from the NYT's website) what OpenAI did was not illegal copyright infringement. The problem is that NYT is under the impression that allowing search engines to access their paywalled stuff and hoping they don't do anything else with it besides putting up conventional, Google-style SERPs makes for copyright infringement if what one of them actually wants it for involves other reasons. It doesn't. There's an adequate legal instrument available for NYT to use if they want to enforce conditions on use: a contract. Do they have a contract that somebody violated? If not, they have no cause to go after anyone. And if they do have a contract that was violated, then that's still not copyright infringement—it's a breach of their contract.
You seem to be confused, giving someone permission to do X is only permission to do X and nothing else.
If I open a store you have implied permission to enter the premises, but not stay inside for 16 hours. Further the second I ask you to leave you no longer have permission to be on the premises. The store doesn’t need to add a lock or put up signs, you’re aware you don’t have permission and that’s it.
Ditch the swipes; I'm not the one confused here. We're talking about copyright infringement, not trespass.
(Your analogy is bad and doesn't hold up. Copyright doesn't grant rightsholders control of the sort required here. It grants them the right to make and distribute copies. It doesn't grant the right to undistribute copies when it turns out they don't like what someone is doing with them.)
A core argument OpenAI is making is transitory copying is allowed as long training is fair use. But a permanent copy stored in a training database isn’t transitory and would itself be copyright infringement, so they don’t do that.
Thus training each version requires permission to download a new copy, which they now lack.
EDIT: The comment I first responded to (only partially reconstructible from quoted parts below) has been edited to something to something almost completely different. Very uncool.
> ChatGPT keeps redownloading works to avoid the issue of keeping permeant copy’s of the training material
I don't know enough about how ChatGPT works to know whether or not that's true, but from an engineering standpoint it certainly sounds wrong because of how insane it would be if true, and I'm not at all convinced that you're right about this given how poorly you understand the other stuff you're trying to argue that I happen to know you're wrong about, so it doesn't seem unwise to conclude that the same is probably true about your claims here. But it doesn't matter, anyway:
> which would be obvious copyright infringement
Wrong. OpenAI keeps asking for copies, NYT keeps giving them out (whether/despite OpenAI having/not having "permission" or not). Not copyright infringement, let alone "obvious copyright infringement".
This is going to be my last response that takes a substantial amount of effort to compose. Arguing with /r/confidentlyincorrect-tier zero-effort Gish gallops* is not a good use of my time.
* especially from someone shameless about editing their posts after the fact to make them diverge substantially from what was originally written
> OpenAI keeps asking for copies, NYT keeps giving them out
The computer said ok isn’t enough or hacking would be legal.
> from an engineering standpoint it certainly sounds wrong
An engineering standpoint is completely irrelevant, this is a pure legal matter. The law is a strange place with it’s own rules you need to actually look into it not just make assumptions.
People should obtain their learning material legally. True statement (at least as "true" as any opinionated value judgement can be, I guess). The question is how, though, is that statement relevant here?
Even by the NYT's own telling (the way that OpenAI obtained their articles was from the NYT's website) what OpenAI did was not illegal copyright infringement. The problem is that NYT is under the impression that allowing search engines to access their paywalled stuff and hoping they don't do anything else with it besides putting up conventional, Google-style SERPs makes for copyright infringement if what one of them actually wants it for involves other reasons. It doesn't. There's an adequate legal instrument available for NYT to use if they want to enforce conditions on use: a contract. Do they have a contract that somebody violated? If not, they have no cause to go after anyone. And if they do have a contract that was violated, then that's still not copyright infringement—it's a breach of their contract.