|
|
|
|
|
by knaik94
1091 days ago
|
|
I agree that there is additional nuance, but so far public data scraping has very clearly been ruled as legal. It's possible that at the time of scraping, copyrighted data was incorporated into the training data because it hadn't been taken down by the host platform yet. But in my opinion, the core idea proposed by the suit that private data was used intentionally, is not true. The GPT4 browsing plugin is equivalent to web scraping. And another complication is that OpenAI is not exposing any static data. A response is generated only after prompting. I'd argue that LLMs are closer to calculators than databses in function. The amount of new information that can be added is also limited, it's is not a continuous learning/training architecture. I do hope this leads to more clear laws regarding data privacy, but I can't imagine the allegations of "intercepting communications", violating CFAA, or violating unfair competition law will hold. |
|
To put it another way, it's legal for me to go to the library and borrow a DVD or a book or poems. That doesn't give me the right to publish the poems again under my own name. Whether I find the poems from scraping, borrowing the book from a library, or even just reading it off of a wall I don't get ownership rights to that data.
The same logic applies to a lot of other laws around data. If you collect data on individuals there are a bunch of laws that come up around it, and many of them don't really concern themselves with how you got the data so much as how you use it. The fact that it was scraped doesn't grant any special legal rights.