Very interesting tool yet completely irrelevant for the United States as AI generated content is not eligible for copyright protection by anyone (pending appeal). As it stands y’all may not like this reality, but it’s quite clear in legal terms. Claiming an AI generated work is protected by copyright simply doesn’t matter regardless of which entity is asserting the right at present.
I don't believe this is the case — in the situation that is commonly referenced to make this point, someone sought to have an AI legally declared to be the author of a specific work, and that was ruled not to be possible. But I am not aware of cases where people use prompts to generate artwork with AI and have found it impossible to copyright.
What's the practical use of this? The AI doesn't know if the output is sufficiently different from the training material. If the output you get matches pre existing content, the license these AI companies give you won't save you.
We don't see it aggressively enforced in the US (unclear if that status quo will continue) but copyright infringement is also in the criminal code, and that can't be indemnified.
Civil indemnification still means a sued party must go to court and assert it as a defense, and there's no guarantee that a judge won't throw it out as invalid. These are uncharted legal waters.
I guess I thought that If an image was generated by these tools, at least in the US, the copyright office did not consider it to have any copyright at all, therefore it was by default public domain?
Note that you can still violate someone else's IP rights, only your side of claims will be null and void if courts determine you're not the creator of content.
But I believe since a ToS isn't a copyright license, this can't really be enforced using copyright laws. Most likely they can ban you. Is there even a slim chance you could be sued for breach of contract? Hell if I know, I'm not a lawyer.
Thinking another layer deep, though, if someone used OpenAI tools to develop software that then later got used to compete with OpenAI, surely it would fully workaround this already unenforceable ToS restriction anyways, right?
> “Specifically, we initialized the DeepSeek-Prover using the DeepSeekMath-Base 7B model (Shao et al., 2024). Initially, the model struggled to convert informal math problems into formal statements. To address this, we fine-tuned the DeepSeek-Prover model using the MMA dataset (Jiang et al., 2023), which comprises formal statements from Lean 4’s mathlib2 that were back-translated into natural language problem descriptions by GPT-4. We then instructed the model to translate these natural language problems into formal statements in Lean 4 using a structured approach.”
I was thinking of their general-purpose models, like DeepSeek-R1 and DeepSeek-V3, for which I haven't found evidence that OpenAI models were used to generate synthetic training data. But I didn't find this, so clearly my searching skills aren't great.