| HN Mirror

Y	Hacker News new \| ask \| show \| jobs

by _jab 380 days ago

> How will you store my data and who can access it?

> The content covered by the court order is stored separately in a secure system. It’s protected under legal hold, meaning it can’t be accessed or used for purposes other than meeting legal obligations.

> Only a small, audited OpenAI legal and security team would be able to access this data as necessary to comply with our legal obligations.

So, by OpenAI's own admission, they are taking abundant and presumably effective steps to protect user privacy here? In the unlikely event that this data did somehow leak, I'd personally be blaming OpenAI, not the NYT.

Some of the other language in this post, like repeatedly calling the lawsuit "baseless", really makes this just read like an unconvincing attempt at a spin piece. Nothing to see here.

5 comments

tptacek 380 days ago

No, there is a whole news cycle about how chats you delete aren't actually being deleted because of a lawsuit, they essentially have to respond. It's not an attempt to spin the lawsuit; it's about reassuring their customers.

VanTheBrand 380 days ago

The part where they go out of the way to call the lawsuit baseless is spin though, and mixing that with this messaging does exactly that, presents a mixed message. The NYT lawsuit is objectively not baseless. OpenAI did train on the Times and chat gpt does output information from that training. That’s the basis of the lawsuit. NYT may lose, this could end up being considered fair use, it might ultimately be a flimsy basis for a lawsuit, but to say it’s baseless (and with nothing to back that up) is spin and makes this message less reassuring.

tptacek 380 days ago

No, it's not. It's absolutely standard corporate communications. If they're fighting the lawsuit, that is essentially the only thing they can say about it. Ford Motor Company would say the same thing (well, they'd probably say "meritless and frivolous").

bee_rider 380 days ago

Standard corporate spin, then?

bunderbunder 379 days ago

No, this isn't even close to spin, it's just a standard part of defending your case. In the US tort system you need to be constantly publicly saying you did nothing wrong. Any wavering on that point could be used against you in court.

jmull 379 days ago

This is a funny thread. You say "No" but then restate the point with slightly different words. As if anything a company says publicly about ongoing litigation isn't spin.

tptacek 380 days ago

No? "Spin" implies there was something else they could possibly say.

justacrow 379 days ago

They could choose to not say it

mmooss 380 days ago

I haven't heard that interpretation; I might call it spin of spin.

mrgoldenbrown 379 days ago

If you're being held at gunpoint and forced to lie, your words are still a lie. Whether you were forced or not is a separate dimension.

bee_rider 379 days ago

That is unrelated to what the expression means.

adamsb6 379 days ago

I’m typing these words from a brain that has absorbed copyrighted works.

mhitza 379 days ago

My understanding is that they have to keep chats based on an order, *as a result of their previous accidental deletion of potential evidence in the case*[0].

And per their own terms they likely only delete messages "when they want to" given the big catch-alls. "What happens when you delete a chat? -> It is scheduled for permanent deletion from OpenAI's systems within 30 days, unless: It has already been de-identified and disassociated from your account"[1]

[0] https://techcrunch.com/2024/11/22/openai-accidentally-delete...

[1] https://help.openai.com/en/articles/8809935-how-to-delete-an...

ofjcihen 379 days ago

They should include the part where the order is a result of them deleting things they shouldn’t have then. You know, if this isn’t spin.

Then again I’m starting to think OpenAI is gathering a cult leader like following where any negative comments will result in devoted followers or those with something to gain immediately jumping to its defense no matter how flimsy the ground.

gruez 379 days ago

>They should include the part where the order is a result of them deleting things they shouldn’t have then. You know, if this isn’t spin.

From what I can tell from the court filings, prior to the judge's order to retain everything, the request to retain everything was coming from the plaintiff, with openai objecting to the request and refusing to comply in the meantime. If so, it's a bit misleading to characterize this as "deleting things they shouldn’t have", because what they "should have" done wasn't even settled. That's a bit rich coming from someone accusing openai of "spin".

ofjcihen 379 days ago

Here’s a good article that explains what you may be missing.

https://techcrunch.com/2024/11/22/openai-accidentally-delete...

gruez 379 days ago

Your linked article talks about openai deleting training data. I don't see how that's related to the current incident, which is about user queries. The ruling from the judge for openai to retain all user queries also didn't reference this incident.

ofjcihen 379 days ago

Sure.

Without this devolving into a tit for tat then the article explains for those following this conversation why it’s been elevated to a court order and not just an expectation to preserve.

mmooss 380 days ago

> It's not an attempt to spin the lawsuit; it's about reassuring their customers.

It can be both. It clearly spins the lawsuit - it doesn't present the NYT's side at all.

roywiggins 379 days ago

It would be extremely unusual (and likely very stupid) for the defendant in a lawsuit to post publicly that the plaintiff maybe has a point.

fallingknife 379 days ago

Why does OpenAI have any obligation to present the NYTs side?

mmooss 379 days ago

Who said 'obligation'?

conartist6 379 days ago

It's hard to reassure your customers if you can't address the elephant in the room. OpenAI brought this on themselves by flaunting copyright law and assuring everyone else that such aggressive and probably-illegal action would be retroactively acceptable once they were too big to fail.

lxgr 380 days ago

If the stored data is found to be relevant to the lawsuit during discovery, it becomes available to at least both parties involved and the court, as far as I understand.

sashank_1509 380 days ago

Obviously openAI’s point of view will be their point of view. They are going to call this lawsuit baseless, they would not be fighting it or else.

ivape 379 days ago

To me it's pretty clear the way this will happen. You will need to buy additional credits or subscriptions through these LLMs that feedback payment to things like NYT and book publishers. It's all stolen. I don't even want to hear it. This company doesn't want to pay up and willing to let user's privacy hang in the balance to draw the case out until they get sure footing with their device launches or the like (or additional markets like enterprise, etc).

Workaccount2 379 days ago

> It's all stolen.

LLMs are not massive archives of data. The big models are a few TB in size. No one is forgoing a NYT subscription because they can ask ChatGPT to print out NYT news stories.

edbaskerville 379 days ago

Regardless of the representation, some people are replacing news consumption generally with answers from ChatGPT.

fallingknife 379 days ago

Copyright is pretty narrowly tailored to verbatim reproduction of content so I doubt they will have to pay anything.

tiahura 379 days ago

incorrect. copyright applies to derived works.

vel0city 379 days ago

Even then, it's possible to prompt the model to exactly reproduce the copyrighted works.

fallingknife 379 days ago

Please show me one of these prompts

hiddencost 380 days ago

> So, by OpenAI's own admission, they are taking abundant and presumably effective steps to protect user privacy here? In the unlikely event that this data did somehow leak, I'd personally be blaming OpenAI, not the NYT.

I am not an Open AI stan, but this needs to be responded to.

The first principle of information security is that all systems can be compromised and the only way to secure data is to not retain it.

This is like saying "well I know they didn't want to go sky diving but we forced them to go sky diving and they died because they had a stroke mid air, it's their fault they died.".

Anyone who makes promises about data security is at best incompetent and at worst dishonest.

nhecker 379 days ago

Data is a toxic asset. -- https://www.schneier.com/essays/archives/2016/03/data_is_a_t...

JohnKemeny 380 days ago

> Anyone who makes promises about data security is at best incompetent and at worst dishonest.

Shouldn't that be "at best dishonest and at worst incompetent"?

I mean, would you rather be a competent person telling a lie or an incompetent person believing you're competent?

HPsquared 379 days ago

An incompetent but honest person is more likely to accept correction and respond to feedback generally.

pritambarhate 380 days ago

May be because you are not OpenAI user. I am. I find it useful and I pay for it. I don't want my data to be retained beyond what's promised in the Terms of Use and Privacy Policy.

I don't think the Judge is equipped to handle this case if they don't understand how their order jeopardies the privacy of millions of users worldwide who don't even care about NYT's content or bypassing their paywalls.

conartist6 379 days ago

You live on a pirate ship. You have no right to ignore the ethics and law of that just because you could be hurt in conflict related to piracy

DrillShopper 379 days ago

The OpenAI Privacy Policy specifically allows them to keep data as required by law.

mmooss 380 days ago

> who don't even care about NYT's content or bypassing their paywalls.

Whether or not you care is not relevant, and is usually the case for customers. If a drug company resold an expensive cancer drug without IP, you might say 'their order jeopardies the health of millions of users worldwide who don't even care about Drug Co's IP.

If the NYT is right - I can only guess - then you are benefitting from the NYT IP. Why should you get that without their consent and for free - because you don't care?

> (jeapordizes)

... is a strong word. I don't see much risk - the NYT isn't going to de-anonymize users and report on them, or sell the data (which probably would be illegal). They want to see if their content is being used.