Hacker News new | ask | show | jobs
by bugfix-66 1312 days ago
It's interesting to consider how you might prevent training using a license without being too restrictive.

Here is an example of a license that attempts to directly prohibit training. The problem is that you can imagine such software can't be used in any part of a system that might be used for training or inference (in the OS, for example). Somehow you need to additionally specify that the software is used directly... But how, what does that mean? This is left as an exercise for the reader and I hope someone can write something better:

  The No-AI 3-Clause License
This is the BSD 2-Clause License, unmodified except for the addition of a third clause. The intention of the third clause is to prohibit, e.g., use in the training of language models. The intention of the third clause is also to prohibit, e.g., use during language model inference. Such language models are used commercially to aggregate and interpolate intellectual property. This is performed with no acknowledgement of authorship or lineage, no attribution or citation. In effect, the intellectual property used to train such models becomes anonymous common property. The social rewards (e.g., credit, respect) that often motivate open source work are undermined.

  License Text:
https://bugfix-66.com/7a82559a13b39c7fa404320c14f47ce0c304fa...
4 comments

This is such a Luddite behavior.

How much hubris we have as a species to think that our professions will endure until the end of the stars. To think that the software we write will be eternal.

The thing that we do now is no different than spinning cotton.

I'd be shocked if the total duration of human-authored programming lasted more than a hundred years.

I'll also wager that in thirty years, "we'll" write more software in any given year than all of history up until that point.

I'm all on board if the Microsoft's of the world are. But they choose to train their AI on OSS code and not their own codebase. So clearly they think similarly to the parent, they just want you to forget about that part when it suits them.
If we pass laws restricting the training on copyrighted information, the only organizations that will be able to train will be institutional.

Microsoft would benefit from restriction. Not us.

would you pay for a product trained on say, the MS Teams, Sharepoint or Skype codebases?

no, and no-one else would either

The spirit of this is good, but the implementation is garbage - you need a lawyer or team of lawyers to do this right. You grandstand and soapbox in this weakly written paragraph, and it hurts the whole thing. You discuss social rewards, intentions, etc. This just reads like a stallman-esque tirade
What about fair use? (both in the copying made for training itself and the resulting output from the service)
We are witnessing a monstrous perversion of "fair use" and the greatest theft of intellectual property in human history.
Do you measure IP's value using the amount of work/effort that was put into creating it, or only the end result?

Currently US copyright law only cares about the end result. Effort has no meaning or bearing in any legal analysis of copyright matters.

Copyright infringement trials are tried in the infringer's jurisdiction.
This is the BSD 2-Clause License:

    1. Redistributions of source code must retain the above copyright
       notice, this list of conditions and the following disclaimer.

    2. Redistributions in binary form must reproduce the above copyright
       notice, this list of conditions and the following disclaimer in
       the documentation and/or other materials provided with the
       distribution.

    THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS
    "AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT
    LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR
    A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT
    HOLDER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL,
    SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT
    LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE,
    DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY
    THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT
    (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE
    OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
Presumably, as long as GitHub Copilot:

a) fails to respect these itself, or

b) present the user that is going to use its output verbatim or produce derivative code from it so that the user can respect these

Then GitHub Copilot is either in violation of the license or a tool assisting in such a violation by stripping the license away†.

From TFA:

> David Heinemeier Hansson, creator of Ruby on Rails, argues that the backlash against Copilot runs contrary to the whole spirit of open source. Copilot is “exactly the kind of collaborative, innovative breakthrough that I’m thrilled to see any open source code that I put into the world used to enable,” he writes. “Isn’t this partly why we share our code to begin with? To enable others to remix, reuse, and regenerate with?”

I don't mean to disrespect DHH, but the "spirit of open source" isn't to wildly share code around as if it were public domain, because it is not, an author gets to choose within which framework their code gets to be used and modified††, otherwise one would have used public domain as a non-license + WTFPL for those jurisdictions where one can't relinquish their own creation into public domain.

† depending on whether the "IA"/Microsoft can be held liable of the automated derivative, or if the end user is.

†† cue GPL vs MIT/BSD