| This proposal gets made pretty frequently in one form or another, and (at least on HN) seems to usually get struck down on this or that procedural ground. But as the various regulatory and judicial and legislative processes grind through different parts of the modern intellectual property issue made so abundantly legible by the modern AI training data gold rush it seems ever more clear that one way or another, we’re going to get a new social contract on IP. Leaving aside for a moment the thicket of laws, precedents, jurisdictions, and regulatory inertia: we can vote with our feet as both customers and contributors for common sense now. So how about the following compromise: promote innovation by liberalizing the posture around training on roughly “the commons”, but insist that the resulting weights are likewise available to the public. Why do I have to take someone’s word for it that they’ve got a result around superposition or whatever on mech interp? I’d like to see it work given it’s everyone’s data pushing those weights. I speak only for myself but plenty of people seem to agree: I don’t mind big companies training on generally available data, I mind the IP-laundering. Compete on cost, compete on value-added software stacks, compete on vertical integration. There is lots of money to be made building a better mousetrap in terms of code and infrastructure and product innovation. Conduct the research in the open. None of this would be possible without an ocean of research and data subsidized in whole or in part by the public. Asserting any form of ownership over the result might end up being legal, but it will never be ethical. Meta isn’t perfect on this stuff, but they’re by far the actor pulling the conversation in that direction. Let’s encourage them to continue pushing the pace on stuff like LLaMA 3. |
what do you mean by that? as far i'm aware ANYTHING that you publish despite being on the internet or not, if there isn't a copyright notice, you should assume -> "all rights reserved"