| HN Mirror

Y	Hacker News new \| ask \| show \| jobs


	by namaria 420 days ago
	If you have been giving the LLMs these problems, there is a non zero chance that they have already been used in training.

1 comments

rovr138 420 days ago

This depends heavily on how you use these and how you have things configured. If you're using API vs web ui's, and the plan. Anything team or enterprise is disabled by default. Personal can be disabled.

Here's openai and anthropic,

https://help.openai.com/en/articles/5722486-how-your-data-is...

https://privacy.anthropic.com/en/articles/10023580-is-my-dat...

https://privacy.anthropic.com/en/articles/7996868-is-my-data...

and obviously, that doesn't include self-hosted models.

link

namaria 420 days ago

How do you know they adhere to this in all cases?

Do you just completely trust them to comply with self imposed rules when there is no way to verify, let alone enforce compliance?

link

blagie 420 days ago

They probably don't, but it's still a good protection if you treat it as a more limited one. If you assume:

[ ] Don't use

Doesn't mean "don't use," but "don't get caught," it still limits a lot of types of uses and sharing (any with externalities sufficient they might get caught). For example, if personal data was being sold by a data broker and being used by hedge funds to trade, there would be a pretty solid legal case.

link

namaria 420 days ago

> it still limits a lot of types of uses and sharing (any with externalities sufficient they might get caught)

I don't understand what you mean

> For example, if personal data was being sold by a data broker and being used by hedge funds to trade

It's pretty easy to buy data from data brokers. I routinely get spam on many channels. I assume that my personal data is being commercialized often. Don't you think that already happens frequently?

I honestly would not put on a textbox on the internet anything I don't assume is becoming public information.

A few months ago some guy found discarded storage devices full of medical data for sale in Belgium. No data that is recorded on media you do not control is safe.

link

gvhst 420 days ago

SOC-2 auditing, which both Anthropic and OpenAI have done does provide some verification

link

diggan 420 days ago

That's interesting, how do I get access to those audits/reports given I'm just an end-user?

link

rovr138 420 days ago

You can fill the form here, https://trust.openai.com/

link

namaria 420 days ago

The audit performed by a private entity called "Insight Assurance"?

Why do you trust it?

link

rovr138 420 days ago

Oh, so now EVERYTHING is fake unless personally verified by you in a bunker with a Faraday cage and a microscope?

You're free to distrust everything. However, the idea that “I don’t trust it so it must be invalid” isn’t an solid argument. It’s just your personal incredulity. You asked if there’s any verification and SOC-2 is one. You might not like it, but it's right there.

Insight Assurance is a firm doing these standardized audits. These audits carry actual legal and contractual risk.

So, yes, be cautious. But being cautious is different than 'everything is false, they're all lying'. In this scenario, NOTHING can be true unless *you* personally have done it.

link

namaria 420 days ago

No, you're imposing a false dichotomy.

I merely said I don't trust the big corporation with a data based business to not profit from the data I provide it with in any way they can, even if they hire some other corporation - whose business is to be paid to provide such assurances on behalf of those who pay them - to say that they pinky promise to follow some set of rules.

link