| The lawsuit is far more nuanced than you're letting on. There are several aspects that come into play- * Was it published publicly? This is basically defined in the courts as "if you make an unauthenticated web request does the data return?". This is where scraping comes in- if you make the data available without authentication you can't enforce your TOS, because you can't validate that people actually even accepted the TOS to begin with. * Is the data able to be copyrighted? This is where things are interesting- facts can not be copyrighted, which is why a lot of scrapers are able to reuse data (things like weather, sports scores, even "for hire" notices can be considered factual). * If it would typically be considered covered by copyright, does fair use come into play? * Are there any other laws that come into play? For example, GDPR, CCPA, or other privacy laws can still add restrictions to how data is collected and used (this is complicated by the various jurisdictions as well) * Was the work done with the data transformative enough to allow it to bypass copyright protections? This goes back to when Google was scanning books. Because they were making a search engine, not a library, their search tool was considered transformative enough to allow them to continue. It's not enough to say "because it's on the internet, it's fair game for everyone to use". This is a really complicated area where things are evolving rapidly, and there's a lot of intersecting law (and case law) that comes into play. |
And another complication is that OpenAI is not exposing any static data. A response is generated only after prompting. I'd argue that LLMs are closer to calculators than databses in function. The amount of new information that can be added is also limited, it's is not a continuous learning/training architecture.
I do hope this leads to more clear laws regarding data privacy, but I can't imagine the allegations of "intercepting communications", violating CFAA, or violating unfair competition law will hold.