Hacker News new | ask | show | jobs
by anon373839 497 days ago
This is an interesting opinion, but there are aspects of it that I doubt will stand the test of time.

One aspect is the court’s ruling that West’s headnotes are copyrightable even when they merely quote a court opinion verbatim, because the editorial decision to quote the material itself shows a “creative spark”. It really isn’t workable — in law specifically - for copyright to attach to the mere selection of a quote from a case to represent that case’s holding on an issue. After all, we would expect many lawyers analyzing the case independently to converge on the same quotes!

The key fact underlying all of this, I think, is that when Ross paid human annotators to write their own versions of the headnotes, they really did crib from West’s wholesale rather than doing their own independent analysis. Source text was paraphrased using curiously similar language to West’s paraphrasing. That, plus the fact that Ross was a directly competing product, is what I see as really driving this decision.

The case has very little to say about the more commonly posed question of whether copyright is infringed in large-scale language modeling.

9 comments

> That, plus the fact that Ross was a directly competing product, is what I see as really driving this decision.

The "competing product" thing is probably the most extreme part of this opinion.

The most important fair use factor is if the use competes with the original work, but this is generally implied to be directly competes, i.e. if you translate someone else's book from English to French and want to sell the translation, the translation is going to be in direct competition for sales to people who speak both English and French. The customer is going to use the copy claiming fair use as a direct substitute for the original work, instead of buying it.

This court is trying to extend that to anything downstream from it, which seems crazy. For example, "multiple copies for classroom use" is one of the explicit examples of fair use from the copyright statute, but schools are obviously teaching people intending to go into competition with the original author, and in general the idea that you can't read something if you ever intend to write something to sell in competition with it seems absurd and in contradiction to the common practices in reverse engineering.

But this is also a district court opinion that isn't even binding on other courts, so we'll see what happens if it gets appealed.

No that is not an extreme interpretation of the fair use factors. This is a routinely emphasized factor in fair use analyses for both copyright and trademark. School fair use is different because that defense is written into the statute directly in 17 U.S.C. § 107. Also, § 108 provides extensive protections for libraries and archives that go beyond fair use doctrines.

The idea that the schools are encouraging the students to compete with the original authors of works taught in the classroom is fanciful by the meaning that courts usually apply to competition. Your example is different from this case in which Ross wanted to compete in the same market against West offering a similar service at a lower price. Another reason that the schools get a carveout is because it would make most education impractical without each school obtaining special licenses for public performance for every work referenced in the classroom.

But maybe that also provokes the question as to if schools really deserve that kind of sweetheart treatment (a massive indirect subsidy), or does it over-privileges formal schools relative to the commons at large?

> School fair use is different because that defense is written into the statute directly

It's written into the statute as an example of something that would be fair use.

> The idea that the schools are encouraging the students to compete with the original authors of works taught in the classroom is fanciful by the meaning that courts usually apply to competition.

People go to art school primarily because they want to create art. People study computer science primarily because they want to write code. It's their direct intention and purpose to compete with existing works.

> Your example is different from this case in which Ross wanted to compete in the same market against West offering a similar service at a lower price.

So if you use Windows and then want to create Linux...

> Another reason that the schools get a carveout is because it would make most education impractical without each school obtaining special licenses for public performance for every work referenced in the classroom.

How is that logic any different than for AI training?

> But maybe that also provokes the question as to if schools really deserve that kind of sweetheart treatment (a massive indirect subsidy), or does it over-privileges formal schools relative to the commons at large?

It not only doesn't have any explicit requirement for a formal school (it just says "teaching"), it also isn't limited to teaching, teaching is just one of the things specified in the statute as being the kind of thing Congress intended fair use to include.

>It's written into the statute as an example of something that would be fair use.

Statutory text controls what the courts can do, even and perhaps especially when it includes an example.

>People go to art school primarily because they want to create art. People study computer science primarily because they want to write code. It's their direct intention and purpose to compete with existing works.

Interesting perspective.

>So if you use Windows and then want to create Linux...

I don't understand your meaning.

>How is that logic any different than for AI training?

That is what Mark Lemley, law professor at Stanford, has argued in his many law review articles and amicus briefs: he believes that training is analogous to learning. The court here didn't agree with the Lemley view.

>It not only doesn't have any explicit requirement for a formal school (it just says "teaching"), it also isn't limited to teaching, teaching is just one of the things specified in the statute as being the kind of thing Congress intended fair use to include.

In practice courts tend to limit these exceptions to formal teaching arrangements.

Copyright covers expression, not ideas. The underlying problem here is that Ross Intelligence never went to the trouble of distilling the purely idea-based and factual element from their original sources; even their finalized search system still had a pervasive reliance on Westlaw's original and creative expression as embedded in their headnotes. Using Windows and then creating Linux is something entirely different because Linux goes to great effort in order not to use anything that's specific to Windows. Large-scale language models are probably somewhere in the middle, because their unique reliance on an incredibly wide variety of published texts makes it very unlikely that they'll ever preserve anything of substance about the expression in any single text.
What a world we’re in where a school using text to teach children, who will remember it, talk about it with others, likely buy it for their own children… can be framed as a “massive indirect subsidy” rather than “free advertising”.
This reflects on the individuals choosing to create and proliferate such misleading or hyperbolic framing more than it does on the world that we all live in. In meatspace we usually reject these ideas and ignore the people pushing them.
The case looks pretty straightforward to me - they copied the notes ( human or machine doesn't really matter ) to directly compete with the author of the notes.

If you wrote a program that automatically rephrased an original text - something like the Encyclopaedia Britannica - to preserve the meaning but not have identical phrasing - and then sold access to that information on in a way that undercut the original - then in my view that's clearly ripping off the original creators of the Encyclopedia and would likely stop people writing new versions of the encyclopedia in the future if such activity was allowed.

These laws are there to make sure that valuable activities continue to happen and are not stopped because of theft. We need textbooks, we need journalistic articles - to get these requires people to be paid to work on them.

I think it's entirely reasonable to say that an LLM is such a program - and if used on sources which are sustained by having paid people work on them, and then the reformatted content is sold on in a way to under cut the original activity then that's a theft that's clearly damaging society.

I see LLM's as simply a different way to access the underlying content - the rules of the underlying content should still apply - ChatGPTs revenues are predicted to be in the billions this year - sending some of that to content creators, so that content continues to be produced, is not just right - it's in their interest.

> automatically rephrased an original text - something like the Encyclopaedia Britannica - to preserve the meaning but not have identical phrasing

Note that it's very hard to do this starting from a single source, because in order to be safe from any copyright concern you'd have to only preserve the bare "idea" and everything else in your text must be independent. But LLM's seem to be able to get around this by looking at many sources that are all talking about the same facts and ideas in very different ways, and then successfully generalizing "out of sample" to a different expression of the same ideas.

The concept clustering across multiple sources allows you to rephrase more accurately while retaining meaning - however the point I'm making is if you then point that program at Encyclopaedia Britannica and simply rephrase it then charge for access to the rephrased version - should you be allowed to do that?
The underlying problem is that "meaning" in the ordinary sense still includes plenty of copyrightable elements. If you point a typical LLM program at some arbitrary text and tell it to "rephrase" that, you'll generally end up with a very close paraphrase that still leaves intact to a huge extent the "structure, sequence and organization" (in a loose sense) of the original. So it turns out that you're still in breach of copyright. All you're allowed to use when starting from a single copywritten text is the ideas and facts in their very barest sense.
So if I made a pop song with was entirely copied from existing songs - but ensured that each fragment was relatively short ( but long enough to be recognisable ), then I'd be ok?

ie the way to avoid copyright is to double down on the copying?

I can see how, for a human, you could argue that there is creativity in splicing those bits together into a good whole - however if that process is automated - is it still creative - or just automated theft?

I think that someone taking Biology 101 and ending up writing textbooks, as opposed to all the other people who just forgot what they learned once the elective was over or ended up working biologists with labs or teachers of biology and so forth, is quite different than someone saying hey I want to make a competing product to this successful company, let's take their content, re-write and use AI to make a competitor, and then actually going into direct competition with that company a couple years later
" court’s ruling that West’s headnotes are copyrightable even when they merely quote a court opinion verbatim"

That is the opposite of the ruling. The judge said the ones that summarize and pick out the important parts are copyrightable and specifically excludes the headnotes that quote court opinion verbatim.

The judge:

"But I am still not granting summary judgment on any headnotes that are verbatim copies of the case opinion (for reasons that I explain below)"

You're right as far as the MSJ is concerned, and I should've been more precise. I was focusing on the dictum in the preceding paragraph (because we're discussing the broader implications of the order rather than the nuts-and-bolts of the instant motion). In that paragraph, the judge wrote:

> More than that, each headnote is an individual, copyrightable work. That became clear to me once I analogized the lawyer’s editorial judgment to that of a sculptor. A block of raw marble, like a judicial opinion, is not copyrightable. Yet a sculptor creates a sculpture by choosing what to cut away and what to leave in place. That sculpture is copyrightable. 17 U.S.C. §102(a)(5). So too, even a headnote taken verbatim from an opinion is a carefully chosen fraction of the whole. Identifying which words matter and chiseling away the surrounding mass expresses the editor’s idea about what the important point of law from the opinion is. That editorial expression has enough “creative spark” to be original. ... So all headnotes, even any that quote judicial opinions verbatim, have original value as individual works.

I personally don't think this sculpture metaphor works for verbatim quotes from judicial opinions.

Yeah, I'm willing to bet that metaphor gets called out as ludicrous by a higher court, as it has broader implications across types of editorial expression that break down when examined.

The marble from which a sculpture is carved is not itself a copyrighted work, and if we imagine it as having copyright protection, to the extent it's recognizable after editorial expression it'd have to qualify as fair use itself.

Both the more general premise (a work must not be an infringement of someone else’s work to be a work subject to copyright) and the more specific premise (court decisions are subject to copyright in the United States) in your argument for why verbatim selection from a court decision is not analogous, for copyright, to a sculptor carving from a block of material are wrong, though.
> Yeah, I'm willing to bet that metaphor gets called out as ludicrous by a higher court, as it has broader implications across types of editorial expression that break down when examined.

It's not ludicrous at all. Whether a work of "selection" from an existing source can be copyrightable in its own right would probably have to be judged on pretty much a case-by-case basis, but even in the context of "selecting" from a ruling there are almost certainly many cases where that work is creative and original enough that it can sensibly be protected by copyright.

> It really isn’t workable — in law specifically - for copyright to attach to the mere selection of a quote from a case to represent that case’s holding on an issue. After all, we would expect many lawyers analyzing the case independently to converge on the same quotes!

I guess it depends on how long the source is, and how long the collection of quotes is, if we’d expect multiple lawyers to converge on the same solution. I don’t think it is totally obvious, though…

I’m also not sure if that’s a generally good test. It seems great for, like, painting. But I wouldn’t be surprised if we could come up with a photography scene where most professionals would converge on the same shot…

If close paraphrase can be detected, this ought to be proof enough that some non-trivial element of creativity was involved in the original text. Because purely functional and necessary elements are not protected by copyright, even when they would otherwise be creative (this is technically known as the 'scenes à faire' case) - and surely a "quote" which is unavoidable because it factually and unquestionably is the core of the ruling would have to fall under that.
Isn't the argument that the act of selecting the right quote is the real work - and the work the copier avoided in the act of copying?

You could argue that all the words are already in the dictionary - so none of them are new, you are just quoting from the dictionary in a particular order......

The reason you have people, rather than computers interpreting the law, is you can make judgements that make sense. Fundamentally these laws are there to protect work being unfairly ripped off.

What was clearly done in this case was a rip-off which damaged the original creator - everything else is dancing on the head of a pin.

Copyright does not protect work ("sweat of the brow"), it only protects expression and creativity. Thus, whenever there is only one right expression or even a bare handful in any given context, copyright does not apply to that particular choice. By analogy, arranging words in some semi-arbitrary order can be an expressive choice, whereas using what's effectively a fixed phrase is not, even though the two might look similar and involve a comparable amount of "work".
The intention of copyright is to protect useful work.

The detail of how to do that in fair way that doesn't block other people is complex[1] - you can never cover all possibilities in a written law - that's why you have people interpreting them and making judgements. All I'm saying is the guiding light in that interpretation is copyright is there to protect the justifiable work of people in a fair way.

Somebody taking those law notes and trivially copying them to directly compete is clearly not 'fair use'.

If those notes could have been created mechanically directly from the original source - why didn't the copier do that - rather than use the competitors work?

[1] given the endless creativity of humans to game systems.

> The intention of copyright is

..."to promote the progress of science and useful arts". I don't see anything in there about rewarding 'work' irrespective of whether that work involves any kind of creativity.

> If those notes could have been created mechanically directly from the original source - why didn't the copier do that

That's actually a very good question. In practice, I do absolutely agree that the notes involve plenty of originality and creativity.

The intention of copyright is ..."to promote the progress of science and useful arts". I don't see anything in there about rewarding 'work' irrespective of whether that work involves any kind of creativity.

Not sure where you got that quote from, but I'd say the work aspect is implicit in the "promote the progress" - ie progress requires that people are able to get paid in their work to progress science or the useful arts.

If the progress was trivial and required no work then it wouldn't need protection or promotion.

And sure it's phrased that way to get the balance between fair use and protection - but if there was no need of protection then copyright wouldn't need to exist - as free reuse is the default.

I think this is the best takeaway. This case and its outcome is restricted to its facts. Most of the LLM activity today is very different than what happened here.
My experience using Westlaw Keycites at work is that they’re not primarily created by fishing a quote out of a holding, but instead by synthesizing a rule. If I want a summary, I read the Keycite; if I want a money quote, I root around in the case linked to the Keycite.

Have you seen different? I’m curious what area of law you practice and in what state, for comparison’s sake.

Yeah, I'd agree that most are synthesized. But I do frequently see headnotes that are verbatim or nearly verbatim slices from the text. Just grabbing a case at random: Kearney v. Salomon Smith Barney, Inc., 39 Cal.4th 95 (2006). The 4th headnote reads:

> The federal system contemplates that individual states may adopt distinct policies to protect their own residents and generally may apply those policies to businesses that choose to conduct business within that state.

And the opinion reads:

> [T]he federal system contemplates that individual states may adopt distinct policies to protect their own residents and generally may apply those policies to businesses that choose to conduct business within that state.

The crux is Fair Use and until lobbyists change the four factor test, AI training has an uphill battle in court. It’s a very disliked observation in this forum, but I stand by my principles on this one because the courts see it my way. Derivative works, especially by artificial means, simply fail the test miserably and that’s the truth.
Collections of essays or poems are considered copyrightable. This seems analogous enough to me.
>the court’s ruling that West’s headnotes are copyrightable even when they merely quote a court opinion verbatim, because the editorial decision to quote the material itself shows a “creative spark” ... when Ross paid human annotators to write their own versions of the headnotes, they really did crib from West’s wholesale rather than doing their own independent analysis

... so it follows that it was then Ross's annotators showing the creative spark