Hacker News new | ask | show | jobs
by fchu 1805 days ago
If a company built a tool like Copilot to help students write essays, is that considered plagiarism? Probably yes, and the reason is that regurgitating blobs of text without actually thinking like a human and writing them anew doesn't feel like actual work, just direct re-use.

Same thinking probably applies to GitHub Copilot and copyright

1 comments

It’s already fairly commonplace for news agencies to generate articles using ML solutions such as https://ai-writer.com/

So by your logic ABC, CBS, Fox, and NBC have all been plagiarizing and violating copyright for doing so? I’m not sure if there’s been a legal challenge/precedent set in that case yet, but that seems like a more apples to apples comparison than the Google Books metaphor being used.

Disclosure: I work at GitHub but am not involved in CoPilot

The big question here is: On what data was the model trained? Presumably the news stations trained theirs on public-domain works and their own backlog of news articles, so even with manual copying there would be no infringement. In contrast, Copilot was trained on other people's code with active copyright.
That’s quite a big presumption IMO. Training sets need to be quite large in order to produce reasonable output. My understanding is that these companies provide the model themselves, which seems like it’d be trained on more than one company’s publications. But I get your point, and understand both sides of the argument here.

I think this will end up with a large class action lawsuit for sure, tho I really think it’s a toss up as to who would win it. This conversation was bound to happen eventually and we’re in uncharted territory here.

I think it’s going to hinge on whether machine learning is considered equivalent in abstraction to human learning, which will be quite an interesting legal, technological, and philosophical precedent to set if it goes that way.