| HN Mirror

Just a fairly arbitrary number. It's easy to produce a few lines from memory, up to 10s of lines and that's "obviously" fair use. I would be surprised if many of haven't inadvertently "copied" some GPL code in this way!

This goes to the "substantial" test for fair use. Clips from a film can contain core plot points, quotes from a book can contain vital passages to understanding a character, screen captures and scrapes of a website can contain huge amounts of textual detail, but depending on the four factors for fair use, still be fair use. (There have been exceptions though.)

The reaction on Hacker News to a machine producing code trained on their works is no different than the reactions artists and writers have had to other ML models. I suspect many of us are biased because it strikes at what we do and we think that our copyrights (because we have so many neat licenses) are special. They are not.

I think it would need to get to that level of "Copilot will emit a kernel module" before it's not obviously fair use.

After all, Google Books will happily convey to me whole pages from copyrighted works, page after page after page.

https://www.google.com/books/edition/Capital_in_the_Twenty_F...