Hacker News new | ask | show | jobs
by duskwuff 118 days ago
This isn't the defense you think it is. Performing a copyrighted work from memory - e.g. a piece of music, a poem, a story, etc - is still a copyright violation. There's no special protection for works that a human has memorized.
2 comments

The key word in the HN headline is _can_.

Humans are not judged on the basis of what they _can_ do.

Reasoning about how to constrain tools on the basis of what they _could_ do, if e.g. used outside their established guardrails, needs to be very nuanced.

Correct; the ability of a model to reproduce source material verbatim does not necessarily make the model's existence illegal. However, using a model to do just that might very well present a legal liability for the user. I would be interested to see the extent to which models can "recite from memory" source code, e.g., from the various MS code leaks. Put another way, if I'm using LLM code generation extensively, do I need to run a filter on its output to ensure that I don't "accidentally" copy large chunks of the Windows codebase?
>There's no special protection for works that a human has memorized.

Who's liable for the copyright infringement if you can coax it out of a system? If you can bypass paywalls by using google's cache feature (or since they got rid of it, but using carefully crafted queries to extract the entire text via snippets), is google on the hook or the person doing it?

Both. If I sell obviously pirated CDs on the street corner, it's not only illegal for me to copy them and sell them, it's also illegal for my customers to buy them.
>it's also illegal for my customers to buy them.

Is it? There's plenty of people prosecuted for running illegal streaming sites and torrenting (which involves uploading), but I don't know of any efforts to crack down on non-distributors.

Just because someone doesn't get arrested does not mean something is legal
Yes. Both Google and the human in question.
1. How does this interact with the ruling that both google books (ie. large scale scanning of books without author's consent) and google snippets (the same, but for websites) have been ruled legal by the courts?

2. Google might not be the most sympathetic defendant, but what about libraries? They offer books to be borrowed, and some offer photocopiers. If you put the two together, you get a copyright infringement operation, all enabled by the library. Should libraries be on the hook too?

For #2 yes...you would be engaging in copyright infringement. The library, being on the hook, would probably ask you to stop if they noticed you copying full books. If not the first time, certainly on the second
>If you can bypass paywalls by using google's cache feature

that is quite different. Google serves (used to serve) to its users whatever the website presents to its crawler, it does not try to avoid paywalls or interact with the website in any capacity other than requesting information