Hacker News new | ask | show | jobs
by WhatsName 514 days ago
You mean OpenAIs infamous "you shall not train on the output of our model" clause?
2 comments

If that's contractually-enforceable in their terms-of-service... then I have my own terms-of-service proposal that I've been kicking around here for several weeks, a kind of GPL-inspired poison-pill:

> If the Visitor uses copyrighted material from this site (Hereafter: Site-Content) to train a Generative AI System, in consideration the Visitor grants the Site Owner an irrevocable, royalty-free, worldwide license to use and re-license any output or derivative works created from that trained Generative AI System. (Hereafter: Generated Content.)

> If the Visitor re-trains their Generative AI System to remove use of the Site-Content, the Visitor is responsible for notifying the Site Owner of which Generated Content is no longer subject to the above consideration. The Visitor shall indemnify the Site-Owner for any usage or re-licensing of Generated Content that occurs prior to the Site-Owner receiving adequate notice.

_________

IANAL, but in short: "If you exploit my work to generate stuff, then I get to use or give-away what you made too. If you later stop exploiting my work and forget to tell me, then that's your problem."

Yes, we haven't managed to eradicate a two-tiered justice system where the wealthy and powerful get to break the rules... But still, it would be cool to develop some IP-lawyer-vetted approach like this for anyone to use, some boilerplate ToS and agree-button implementation guidelines.

I still dont think this has legs, precisely because of this case.

They accessed the material through piracy. They never accepted a TOS. They will probably get away with acquiring the material however they liked because of fair use.

The technicality is that they redistributed the material because of seeding, which is a no no.

That said, you might find inspiration in Midjourneys TOS. Anyone paying less than a Business plan agrees that anyone else on the platform can sample your output and your prompt.

While this won't work too well when the access is indirect via a piracy or a "rogue contractor", it can be applicable to the web-crawlers the companies are directly running.
It's incredibly hypocritical too. They have become rich by training on valuable data produced by others. Yet others are not allowed to train on valuable data produced by them.