|
|
|
|
|
by 1vuio0pswjnm7
35 days ago
|
|
The idea of "all the public works ever created" is easily contested. Not every work has been "published", let alone scanned, digitised or published to the internet The marketing for "AI" uses phrases like "the sum of all human knowledge" to refer to what has been used to create "models". The assumed irrelevance of non-published, "private" works is dubious if not absurd The internet now allows potentially anyone to publish anything, e.g., via personal websites, social media pages, etc. But that doesnt mean everyone partakes. How much of the unfiltered garbage published by those who do has been used to create these "models" "AI" companies will not reveal exactly what "works" were used to create the "models" |
|
But if I were going to comment on Swartz I would ask first whether the "AI" models are trained on the contents of JSTOR, or the contents of PACER (that are not being shared on the internet for free)
Otherwise, the comparison is difficult to make, IMHO
For example, with respect to any materials from JSTOR, the "stealing" was done by the pirate library contributors, not the "AI" companies not the "AI" companies. And with respect to PACER, the "stealing" by Swartz was, technically, done from government computers
If readers are into "above the law" consipracy theories about "AI" companies, check out the bizarre story of the OpenAI employee who was the document custodian witness for the plaintffs in the NYTimes copyright litigation. Committed suicide before testifying