Hacker News new | ask | show | jobs
by echelon 1809 days ago
> Even without a license or permission, fair use permits the mass scanning of books, the storage of the content of those books, and rendering verbatim snippets of those books.

For commercial use and derivative works?

Authors won't incorporate snippets of books into new works unless they're reviews. Copilot is different.

2 comments

Google Books is a commercial site which incorporated the snippets of millions of copyrighted works. And of course, sitting in thousands of Google servers/databases are full copies of each of those books, photos of each page, the OCRed text of each page, and indexes to search them. Even that egregious copying without a license or permission was considered fair use.

If anything, the ways in which Copilot is different aid Microsoft/GitHub's argument for fair use. Because Copilot creates novel new works, that gives them a strong argument their system is more transformative than Google Books, which just presents verbatim copies of books.

The Google books example really misses the point, one of the reasons why the judges considered it fair use was because it was pointing back to the original sources (and thus potentially increasing publishers earnings).

Copilot does none of that. If all the ML companies are so sure this is fair use I encourage them to train an AI on Disney movies to generate short cartoon snippets based on some description. There sure would be a court case.

The main issue here is less doing it, but getting sufficiently nice results. I've done work in generative AI before and right now the state of the art is passable on single images with some but not enough control and is still weak on videos without heavy structure requirements. I expect in 5-10 years we will have good enough models (or hardware) to do short video generation and the question will get tested then. I also think a meaningful good video requires audio and have fun making well aligned text (for dialogue) audio of that text, and video frames. Aligning all that generation together is still challenging today.
> Authors won't incorporate snippets of books into new works

Of course they do, previous works are quoted all the time.

But that's another thing - co-pilot doesn't quote it encourages something more akin to plagarism, doesn't it?
Plagiarism, pretending you made a work entirely yourself when you didn't, is rarely a matter for a court to decide and the standards for what constitutes plagiarism can vary a lot. When I turn in projects for a course, a cite sources in the comments a lot, even if what I turn in is substantially modified. An employer generally doesn't care if you copied and pasted code from StackOverflow or wherever, so long as you don't expose them to a suit and you don't lie if asked "Did you write this 100% yourself?"

Citing your source is not a get out jail free card for copyright infringement, it doesn't really matter.

> Citing your source is not a get out jail free card

No, but it's a requirement of the license stackoverflow.com uses, which is unfortunate, for code (as opposed to text, where a quote can be easily attributed).

...with attribution.
And without. Attribution isn't a "copyright escape clause", copying a work without permission is still infringement - unless it's fair use.

Plagiarism is not the same as infringement.