| I wonder whether OpenAI legal can make the case for storing fuzzy hashes of the content, in the form of ssdeep[1] hashes or content-defined chunks[2] of said data, instead of the actual conversations themselves. After all, since the NYT has a very limited corpus of information, and supposedly people are generating infringing content using their APIs, said hashes can be used to compare whether such content has been generated. I'd rather have them store nothing, but given the overly broad court order I think this may be the best middle ground. Of course, I haven't read the lawsuit documents and don't know if NYT is requesting far more, or alleging some indirect form of infringement which would invalidate my proposal. [1] https://ssdeep-project.github.io/ssdeep/index.html [2] https://joshleeb.com/posts/content-defined-chunking.html |
For example, the judge seems to have asked if it would be possible to segregate data that the users wanted deleted from other data, but OpenAI has failed to answer. Not just denied the request, but simply ignored it.
I think it's quite likely that OpenAI has taken the PR route instead of seriously engaging with any way to constructively honor the request for retention of data.