Hacker News new | ask | show | jobs
by capnrefsmmat 372 days ago
Courts have always had the power to compel parties to a current case to preserve evidence. (For example, this was an issue in the Google monopoly case, since Google employees were using chats set to erase after 24 hours.) That becomes an issue in the discovery phase, well after the defendant has an opportunity to file a motion to dismiss. So a case with no specific allegation of wrongdoing would already be dismissed.

The power does not extend to any of your hypotheticals, which are not about active cases. Courts do not accept cases on the grounds that some bad thing might happen in the future; the plaintiff must show some concrete harm has already occurred. The only thing different here is how much potential evidence OpenAI has been asked to retain.

4 comments

> Courts have always had the power to compel parties to a current case to preserve evidence.

Not just that, even without a specific court order parties to existing or reasonably anticipated litigation have a legal obligation that attaches immediately to preserve evidence. Courts tend to issue orders when a party presents reason to believe another party is out of compliance with that automatic obligation, or when there is a dispute over the extent of the obligation. (In this case, both factors seem to be in play.)

Lopez v. Apple (2024) seems to be a recent and useful example of this; my lay understanding is that Apple was found to have failed in its duty to switch from auto-deletion (even if that auto-deletion was contractually promised to users) to an evidence-preservation level of retention, immediately when litigation was filed.

https://codiscovr.com/news/fumiko-lopez-et-al-v-apple-inc/

https://app.ediscoveryassistant.com/case_law/58071-lopez-v-a...

Perhaps the larger lesson here is: if you don't want your service provider to end up being required to retain your private queries, there's really no way to guarantee it, and the only real mitigation is to choose a service provider who's less likely to be sued!

(Not a lawyer, this is not legal advice.)

So if Amazon sues Google, claiming that it is being disadvantaged in search rankings, a court should be able to force Google to log all search activity, even when users delete it?
Yes. That's how the US court system works.

Google can (and would) file to keep that data private and only the relevant parts would be publicly available.

A core aspect to civil lawsuits is everyone gets to see everyone else's data. It's that way to ensure everything is on the up and up.

A great model – in a world without the Internet and LLMs (or honestly just full text search).
Maybe you misunderstood. The data is required to be retained, but there is no requirement to make it accessible to the opposition. OpenAI already has this data and presumably mines it themselves.

Courts generally require far more data to be retained than shared, even if this ask is much more lopsided.

If Amazon sues Google, a legal obligation to preserve all evidence reasonably related to the subject of the suit attaches immediately when Google becomes aware of the suit, and, yes, if there is a dispute about the extent of that obligation and/or Google's actual or planned compliance with it, the court can issue an order relating to it.
At Google's scale, what would be the hosting costs of this I wonder. Very expensive after a certain point, I would guess.
>At Google's scale, what would be the hosting costs of this I wonder. Very expensive after a certain point, I would guess.

Which would be chump change[0] compared to the costs of an actual trial with multiple lawyers/law firms, expert witnesses and the infrastructure to support the legal team before, during and after trial.

[0] https://grammarist.com/idiom/chump-change/

It can be just anonymised search history in this case.
> It can be just anonymised search history in this case.

Depending on the exact issues in the case, a court might allow that (more likely, it would allow only turning over anonymized data in discovery, if the issues were such that that there was no clear need for more) but generally the obligation to preserve evidence does not include the right to edit evidence or replace it with reduced-information substitutes.

We found that one was a bad idea in the earliest days of the web when AOL thought "what could the harm be?" about turning over anonymised search queries to researchers.
How did you go from a court order to persevere evidence and jump to dumping that data raw into the public record?

Courts have been dealing with discovery including secrets that litigants never want to go public for longer than AOL has existed.

That sounds impossible to do well enough without being accused of tampering with evidence.

Just erasing the userid isn’t enough to actually anonymize the data, and if you scrubbed location data and entities out of the logs you might have violated the court order.

Though it might be in our best interests as a society we should probably be honest about the risks of this tradeoff; anonymization isn’t some magic wand.

So then the courts need to find who is setting their chats do be deleted and order them to stop. Or find specific infringing chatters and order OpenAI to preserve these specified users’ logs. OpenAI is doing the responsible thing here.
OpenAI is the custodian of the user data, so they are responsible. If you wanted the court (i.e., the plaintiffs) to find specific infringing chatters, first they'd have to get the data from OpenAI to find who it is -- which is exactly what they're trying to do, and why OpenAI is being told to preserve the data so they can review it.
So the courts should start ordering all ISPs, browsers, and OSs to log all browsing and chat activity going forward, so they can find out which people are doing bad things on the internet.
No, they should not.

However, if the ISP, for instance, is sued, then it (immediately and without a separate court order) becomes illegal for them to knowingly destroy evidence in their custody relevant to the issue for which they are being sued, and if there is a dispute about their handling of particular such evidence, a court can and will order them specifically to preserve relevant evidence as necessary. And, with or without a court order, their destruction of relevant evidence once they know of the suit can be the basis of both punitive sanctions and adverse findings in the case to which the evidence would have been relevant.

If those entities were custodians in charge of the data at hand in the court case, the court would order that.

This post appears to be full of people who aren’t actually angry at the results of this case but angry at how the US legal system has been working for decades, possibly centuries since I don’t know when this precedent was first set

Is it not valid to be concerned about overly broad invasions of privacy regardless of how long such orders have been occurring?
What privacy specifically? The courts have always been able to compel people to recount things they know which could include a conversation between you and your plumber if it was somehow related to a case.

The company records and uses this stuff internally, retention is about keeping information accurate and accessible.

Lawsuits allow in a limited context the sharing of non public information held by individuals/companies in the lawsuit. But once you submit something to OpenAI it’s now there information not just your information.

Its not an “invasion of privacy” for a company who already had data to be prohibited from destroying it when they are sued in a case where that data is evidence.
Yeah, sure. But understanding the legal system tells us the players and what systems exist that we might be mad at.

For me, one company obligated to retain business records during civil litigation against another company, reviewed within the normal discovery process is tolerable. Considering the alternative is lawlessness. I'm fine with it.

Companies that make business records out of invading privacy? They, IMO, deserve the fury of 1000 suns.

It’s not private. You handed over the data to a third party.
If you cared about your privacy, why are you handing all this stuff to Sam Altman? Did he represent that OpenAI would be privacy-preserving? Have they taken any technical steps to avoid this scenario?
> So the courts should start ordering all ISPs, browsers, and OSs to log all browsing and chat activity going forward, so they can find out which people are doing bad things on the internet.

Not "all", just the ones involved in a current suit. They already routinely do this anway (Party A is involved in a suit and is ordered to retain any and all evidence for the duration of the trial, starting from the first knowledge that Party A had of the trial).

You are mischaracterising what happens; you are presenting it as "Any court, at any time can order any party who is not involved in any suit in that sourt to forever hold user data"

That is not what is happening.

Or you didn't read what was written by the other comment, or are just arguing in bad faith, what's even weierder because the guy was only explaining how the the system always worked
> So then the courts need to find who is setting their chats do be deleted and order them to stop.

No, actually, it doesn't. Ordering a party to stop destroying evidence relevant to a current case (which is its obligation even without a court order) irrespective of whether someone else asks it to destroy that evidence is both within the well-established power of the court, and routine.

> Or find specific infringing chatters and order OpenAI to preserve these specified users’ logs.

OpenAI is the alleged infringer in the case.

Under this theory, if a company had employees shredding incriminating documents at night, the court would have to name those employees before ordering them to stop.

That is ridiculous. The company itself receives that order, and is IMMEDIATELY legally required to comply - from the CEO to the newest-hired member of the cleaning staff.

Time does not need user logs to prove such a thing if it was true. Times can show that it is possible so they can show how their own users can access the text. Why would they need other user's data?
> Time does not need user logs to prove such a thing if it was true.

No it needs to show how often it happens to prove a point of how much impact its had.

Why would that matter, if people didn't use it as much, does it mean that it doesn't matter if there were few people?
> Why would that matter

Because its a copyright infringement case, so existence and the scale of the infringement is relevant to both whether there is liability and, if so, how much; the issue isn't that it is possible for infringement to occur.

You have to argue damages. It actually has to have cost NYT some money, and for that you need to know some extent.
We don't even know if Times uses AI to get information from other sources either. They can get a hint of news and then produce their material.
OpenAI is also entitled to discovery. They can literally get every email and chat the times has and require from this point on they preserve such logs
> We don't even know if Times uses AI to get information from other sources either

which is irrelevant at this stage. Its a legal principle that both sides can fairly discover evidence. As finding out how much openAI has infringed copyright is pretty critical to the case, they need to find out.

After all, if its only once or twice, thats a couple of dollars, if its millions of times, that hundreds of millions

Who cares? That's not a legal argument and it doesn't mean anything to this case.
Oh, I was unaware that Times was inventing a novel technology with novel legal questions.

It’s very impressive they managed to do such innovation in their spare time while running a newspaper and site

For the most part (there are a few exceptions), in the US lawsuits are not based on "possible" harm but actual observed harm. To show that, you need actual observed user behavior.
> Times can show that it is possible

The allegation is not that merely that infringement is possible; the actual occurrence and scale are relevant to the case.