Hacker News new | ask | show | jobs
by nsagent 957 days ago
I'm not so sure about this. I'm not saying courts in the US will rule one way or the other, I'm just saying it's certainly not a forgone conclusion that training is fair use. Even if it is, the companies might not have sone their due diligence.

Lots of data they trained on is available for purchase (e.g. artists often sell prints or reproduction rights, the books in books3 are widely available, etc). It's my understanding that companies like Stability and OpenAI did not attempt to determine if the data they trained on was available for purchase and then buy a legally purchased copy for training. That might cause them to run afoul of fair use doctrine in the US (not sure of other jurisdictions).

See these excerpts describing fair use for copying library materials [1] (many of these collections are being released by groups referring to themselves as libraries):

> Copying a complete work from the library collection is prohibited unless the work is not available at a “fair price.” This is generally the case when the work is out of print and used copies are not available at a reasonable price. If a work, located within the library’s collection, is available at a reasonable price, the library may reproduce one article or other contribution to a copyrighted collection or periodical issue, or a small part of any other copyrighted work, for example, a chapter from a book. This right to copy does not apply if the library is aware that the copying of a work (available at a fair price) is systematic. For example, if 30 different members of one class are requesting a copy of the same article, the library has reason to believe that the instructor is trying to avoid seeking permission for 30 copies.

> The copying, whether performed by the library or whether unsupervised by the library patron, cannot be for a commercial advantage. This means that the library (or a copying service hired by the library) cannot profit from the copying. In addition, the copying for the patron must be done for purposes of private study, scholarship, or research.

[1]: https://fairuse.stanford.edu/overview/academic-and-education...

1 comments

The availability of the copyrighted works is not determinative. Fair use in the US takes (at minimum) four factors into account, listed in the federal copyright statute: https://www.law.cornell.edu/uscode/text/17/107.

That quote from Stanford's library is not discussing fair use doctrine in general, but rather is stating what is permitted in those specific circumstances. There are plenty of instances of fair use where the underlying work used was available at a fair price. That's the whole point of fair use law: some use of a work that is facially infringement escapes liability because the particular use is considered fair.