|
|
|
|
|
by infogulch
1807 days ago
|
|
Why did you choose the standard of "substantial" = "100s of lines"? Especially since we've already seen examples of verbatim output in the dozens of lines range, that choice of standard is rather conveniently just outside what exists so far. If we find a case with 200 lines of verbatim output will you say the only reasonable standard is 1000s of lines? I don't think your argument is as strong as you're making it out to be. |
|
This goes to the "substantial" test for fair use. Clips from a film can contain core plot points, quotes from a book can contain vital passages to understanding a character, screen captures and scrapes of a website can contain huge amounts of textual detail, but depending on the four factors for fair use, still be fair use. (There have been exceptions though.)
The reaction on Hacker News to a machine producing code trained on their works is no different than the reactions artists and writers have had to other ML models. I suspect many of us are biased because it strikes at what we do and we think that our copyrights (because we have so many neat licenses) are special. They are not.
I think it would need to get to that level of "Copilot will emit a kernel module" before it's not obviously fair use.
After all, Google Books will happily convey to me whole pages from copyrighted works, page after page after page.
https://www.google.com/books/edition/Capital_in_the_Twenty_F...