Hacker News new | ask | show | jobs
by elfatizer 1336 days ago
There are lots of comments arguing for or against Copilot on a value judgment, and having an opinion on it being ethical or legal, etc isn't going to be the same for everyone. But I think regardless of where you stand, there should be some sort of legal ruling to clarify the gray areas that Butterick breaks down.
4 comments

Agreed, but I also hate how so much of our substantive law basically has to be created by the courts because (a) many of our legislatures, especially at the federal level, have become more and more non-functional, and (b) IMO legislatures are especially bad at implementing technical legislation.

I think there is a good, fundamental legal/societal question of how copyright should apply to AI output. I just don't think our existing copyright structures handle this question well.

Note there is currently a very important case before the SCOTUS that is related to this issue, [1] where the original photographer of a Prince photo is suing Andy Warhol's estate for copyright infringement. The fundamental question is whether the Warhol series of painting are "transformative" enough of the original photo. While there are always gray lines on what "transformative" means, if there is any chance that Warhol's painting are legal and not infringing, I don't see how Copilot could be in the wrong. Copilot's output, even if it contains a substantial amount of the original source, appears to me much more "transformative" than the Warhol paintings are compared to the original photo.

1. https://www.npr.org/2022/10/12/1127508725/prince-andy-warhol...

I agree. Any law that's only clear after a court ruling is, de facto, an ex post facto law. Disgusting.
That's how common law works, it's not disgusting, (unless perhaps you're an overzealous adherent of civil law) nor is it ex post facto. Legislation is produced, (claimed) grey areas are challenged in court, if the outcomes appear unfair then legislators (should) update the law.

Badly written law and poor legislators are a problem in any system.

What the solution?

Is it really better to only draft laws that are clear without courts?

Is that provable?

Bingo, I feel so uneasy at the thought we could risk a lawsuit because a colleague put unlicensed code in our repos.
Butterick sneakily asserts over and over that Copilot is simply retrieving code from Github ("Copilot's whizzy code-retrieval methods", "Copilot is merely a convenient alternative interface to a large corpus of open-source code", "our work is stashed in a big code library in the sky called Copilot"). This verbiage seems specifically chosen to present a misleading picture of what Copilot is and does.

Copilot is a set of trained weight values in a matrix. There is no source code stored in that matrix. The fact that someone can prompt Copilot with specifically chosen text to generate a short sequence of code that matches a corresponding segment of code used to train the model does not mean that it is somehow "just retrieving" that snippet. It is _generating_ that code, guided by the weight matrix, via pattern-matching based on the chosen textual prompt and surrounding context.

That distinction is significant because one of the primary defenses against copyright infringement in US law is if the derived work is transformative. Copilot is a work derived in part from Github code, but it has unique capabilities far beyond returning short snippets of input code, and the work itself is clearly an extensive transformation of the input data.

This is without even considering whether concrete _outputs_ of the model that happen to match code in a repository used to train it are themselves protected via copyright or not, which is another issue entirely (and not as cut and dried as many folks on here seem to think).

Correct. He's written a great opening argument, as long as you're the sort of person who likes speeches. To me it was full of tricks to prime the reader into accepting his premises as axiomatic, from nuanced rhetoric to pull quotes with attractive color gradients. In my view his actual motivation is the typical 30% cut of any class action settlement that goes to the lawyers, and he sees a lucrative opportunity to combine two skillsets.
And if that's the case, the legal ruling shouldn't stop at code. It should encompass any images or text that aren't in the public domain, don't have copyright owned by the training entity, or otherwise permit this use. Which probably throws a massive spanner into a lot of machine learning. Which may well be fine but would probably be a big setback for ML generally.

Any higher court ruling might well draw some lines between different domains, but be clear that a ruling against GitHub would almost certainly be a ruling for copyright maximization and against fair use in other respects. So be careful what you wish for.