| HN Mirror

Y	Hacker News new \| ask \| show \| jobs


	by VBprogrammer 1309 days ago
	Pushing the responsibility onto copyright owners rather than GitHub / Microsoft / Copilot seems unreasonable. I'm all for AI being used like this but it also needs to come with some checks and balances to ensure it's not just regurgitation copyright code.

1 comments

mring33621 1309 days ago

OK, then just use existing copyright licensing:

If a permissive, biz-friendly license (Apache 2.0, maybe others) is found in a given Repo, then it can be used in training set

Otherwise, the repo cannot be used in a training set

link

mbreese 1309 days ago

And then every snippet ever created with that trained data would have to include an acknowledgement for every repository included in the training set.

The LICENSE file would be longer than the rest of the code.

(FWIW, I agree with you theoretically, but practically it's hard to get your head around what the ramifications of that would mean)

link

leni536 1309 days ago

Many permissive licenses (including Apache 2.0) require attribution.

link

coredog64 1309 days ago

If Joe Bag’O’Donuts copies and pastes LGPL code into his own personal repository that has MIT license attached, is it safe for Copilot to train on it?

I’m really of the opinion that MS needs to document the training set and include a high bar for inclusion of additional repos.

link