Hacker News new | ask | show | jobs
by AaronFriel 1808 days ago
Google Books is a commercial site which incorporated the snippets of millions of copyrighted works. And of course, sitting in thousands of Google servers/databases are full copies of each of those books, photos of each page, the OCRed text of each page, and indexes to search them. Even that egregious copying without a license or permission was considered fair use.

If anything, the ways in which Copilot is different aid Microsoft/GitHub's argument for fair use. Because Copilot creates novel new works, that gives them a strong argument their system is more transformative than Google Books, which just presents verbatim copies of books.

1 comments

The Google books example really misses the point, one of the reasons why the judges considered it fair use was because it was pointing back to the original sources (and thus potentially increasing publishers earnings).

Copilot does none of that. If all the ML companies are so sure this is fair use I encourage them to train an AI on Disney movies to generate short cartoon snippets based on some description. There sure would be a court case.

The main issue here is less doing it, but getting sufficiently nice results. I've done work in generative AI before and right now the state of the art is passable on single images with some but not enough control and is still weak on videos without heavy structure requirements. I expect in 5-10 years we will have good enough models (or hardware) to do short video generation and the question will get tested then. I also think a meaningful good video requires audio and have fun making well aligned text (for dialogue) audio of that text, and video frames. Aligning all that generation together is still challenging today.