|
|
|
|
|
by nnvvhh
957 days ago
|
|
Current US copyright law is not clearly in a place to view model training as infringement. Courts have a long history of permissiveness in the face of copyright challenges to new tech (e.g. the image search engine cases, Google v. Oracle and smartphones, Sony v. Universal and VCRs) and I predict it will happen again with AI. The cat is out of the bag and judges know that finding training to be infringement of each training example will have a negative impact on a new product category. If training was more obviously infringement then that permissiveness would be harder to sell, but in my opinion it's really difficult to argue that a "copy" of an example has been made during training (aside from the copy made to process the example). |
|
Lots of data they trained on is available for purchase (e.g. artists often sell prints or reproduction rights, the books in books3 are widely available, etc). It's my understanding that companies like Stability and OpenAI did not attempt to determine if the data they trained on was available for purchase and then buy a legally purchased copy for training. That might cause them to run afoul of fair use doctrine in the US (not sure of other jurisdictions).
See these excerpts describing fair use for copying library materials [1] (many of these collections are being released by groups referring to themselves as libraries):
> Copying a complete work from the library collection is prohibited unless the work is not available at a “fair price.” This is generally the case when the work is out of print and used copies are not available at a reasonable price. If a work, located within the library’s collection, is available at a reasonable price, the library may reproduce one article or other contribution to a copyrighted collection or periodical issue, or a small part of any other copyrighted work, for example, a chapter from a book. This right to copy does not apply if the library is aware that the copying of a work (available at a fair price) is systematic. For example, if 30 different members of one class are requesting a copy of the same article, the library has reason to believe that the instructor is trying to avoid seeking permission for 30 copies.
> The copying, whether performed by the library or whether unsupervised by the library patron, cannot be for a commercial advantage. This means that the library (or a copying service hired by the library) cannot profit from the copying. In addition, the copying for the patron must be done for purposes of private study, scholarship, or research.
[1]: https://fairuse.stanford.edu/overview/academic-and-education...