Hacker News new | ask | show | jobs
by yccs27 1812 days ago
The big question here is: On what data was the model trained? Presumably the news stations trained theirs on public-domain works and their own backlog of news articles, so even with manual copying there would be no infringement. In contrast, Copilot was trained on other people's code with active copyright.
1 comments

That’s quite a big presumption IMO. Training sets need to be quite large in order to produce reasonable output. My understanding is that these companies provide the model themselves, which seems like it’d be trained on more than one company’s publications. But I get your point, and understand both sides of the argument here.

I think this will end up with a large class action lawsuit for sure, tho I really think it’s a toss up as to who would win it. This conversation was bound to happen eventually and we’re in uncharted territory here.

I think it’s going to hinge on whether machine learning is considered equivalent in abstraction to human learning, which will be quite an interesting legal, technological, and philosophical precedent to set if it goes that way.