Hacker News new | ask | show | jobs
by aeon_ai 105 days ago
You've likely paid attention to the litigation here. Regardless of what remains to be litigated, the training in and of itself has already been deemed fair use (and transformative) by Alsup.

Further, you know that ideas are not protected by copyright. The code comparison in this demonstrates a relatively strong case that the expression of the idea is significantly different from that of the original code.

If it were the case that the LLM ingested the code and regurgitated it (as would be the premise of highlighting the training data provenance), that similarity would be much higher. That is not the case.

2 comments

You're right, I've followed the litigation closely. I've advocated for years that "training is fair use" and I'm generally an anti-IP hawk who DEFENDS copyright/trademark cases. Only recently have I started to concede the issue might have more nuance than "all training is fair use, hard stop." And I still think Judge Alsup got it right.

That said, even if model training is fair use, model output can still be infringing. There would be a strong case, for example, if the end user guides the LLM to create works in a way that copies another work or mimics an author or artist's style. This case clearly isn't that. On the similarity at issue here, I haven't personally compared. I hope you're right.

I think “strong case” is probably reliant on a few points on the output side, and would have to be more than just author/artists style.

Style itself would be very hard to deem infringement, for obvious reasons (idea) - I think it’s much more likely an issue when a character has derivative elements (e.g., iron man, spider man esque features), and where the users prompt had explicit references to those characters (intent)

All that said, even then, on the artistic side I think it would come down to the same analysis that would apply to traditional media - AI is just a vehicle that introduces some novel risks.

Music might be more risky given the litigious nature of the industry.

Code? It’s going to be hard to claim infringement with dramatically different implementations, barring patent coverage.

> The code comparison in this demonstrates a relatively strong case that the expression of the idea is significantly different from that of the original code.

Can I use one AI agent to write detailed tests based on disassembled Windows, and another to write code that passes those same function-level tests? If so, I'm about to relicense Windows 11 - eat my shorts, ReactOS!