Those instructions seem to be for plugins, not scraping training data.
In any case, OpenAI should inspect every website's terms of use before ingesting it in their training data. They shouldn't be exempted from this work. We shouldn't have to conform to their methods, there are laws and systems in place for that. Expensive, yes.
If we were talking about free, open-source AIs available to everyone (ironically, what OpenAI set out to become), I'd be inclined to agree with you. However, we're talking about commercialised AIs that scrape your intellectual property and turn it into a money printing machine without paying you a dime.
> If you don't want me stealing and reusing your licensed open source code don't make it public
A practical matter, larger point completely aside: a nonzero number of individuals and corps will indeed use licensed code internally if they come across it and they feel it helps their goals.
Oracle and Microsoft will send you a million dollar bill if you tried that with their products. You would be surprised by who rats out a company for a reward
In any case, OpenAI should inspect every website's terms of use before ingesting it in their training data. They shouldn't be exempted from this work. We shouldn't have to conform to their methods, there are laws and systems in place for that. Expensive, yes.