| HN Mirror

Y	Hacker News new \| ask \| show \| jobs


	by relyks 496 days ago
	Why wouldn't it be? OpenAI and Anthropic keep everyone's prompts and use them for training too

3 comments

energy123 496 days ago

Because of how corporations and state are tightly fused in China's governance.

> A Leninist system features an authoritarian regime in which the ruling elite monopolizes political power in the name of a revolutionary ideology through a highly articulated party structure that parallels, penetrates, and dominates the state at all levels and extends to workplaces, residential areas, and local institutions.

From: https://www.csis.org/analysis/soviet-lessons-china-watching

All user data submitted to DeepSeek is accessible to the CCP.

csmpltn 496 days ago

As opposed to the US?

noduerme 496 days ago

Yes. These are not comparable political systems. In the US, the information you share can be accessed by law enforcement with the approval of a judge if there's a crime suspected. But in cases where the government improperly accesses your data, they actually destroy their own case against you, because anything from that poisoned tree of evidence can be thrown out in court. Even when governmental power is abused in the US, it is nothing like the routine surveillance and suppression that chills free thought and speech in a totalitarian dictatorship like China.

csmpltn 496 days ago

> "In the US..."

I'm sorry, but your idea of how the US works is a complete fairytale. You need to get a serious reality check on how the US actually works in real life. The law in the US is applied selectively (depending on the profiles involved, severity of case, political backdrop, etc). There's plenty of corruption, misaligned incentives, and corporate meddling. I can't count the number of cases from the past 30+ years that demonstrate this.

buyucu 496 days ago

It weird how people pretend the Edward Snowden disclosures never happened.

noduerme 485 days ago

Also weird how people pretend Snowden wasn't just trying to draw equivalence between the US and the dictatorship where he currently resides, on behalf of said dictatorship.

fragmede 495 days ago

It's weird how people think companies read about Edward Snowden and then didn't do shit about it and just let the NSA keep tapping their lines.

https://www.npr.org/sections/thetwo-way/2014/03/20/291959446...

noduerme 485 days ago

probably because we have a system of laws wherein a good corporate legal team can generally outmaneuver what passes for our secret police.

buyucu 494 days ago

It's illegal for US companies to deny US government data. Have you heard of the Cloud Act?

okasaki 496 days ago

https://en.wikipedia.org/wiki/Parallel_construction

buyucu 496 days ago

all data you submit to Google, OpenAI, Meta, Facebook, Twitter... is accesible by US government.

The US government has been much more belligerent, and it's very natural to see DeepSeek as the lesser of the evils.

scrollop 496 days ago

The CCP will never be the lesser of two evils.

buyucu 496 days ago

CCP did not invade Iraq, Libya, Afganistan, bomb Syria or support the Palestinian Genocide.

csmpltn 496 days ago

> "CCP did not invade Iraq, Libya, Afganistan, bomb Syria or support the Palestinian Genocide."

1. There has been no genocide in Palestine.

2. CCP meddles in other countries to equal if not worse degrees - both militarily and politically/economically. Routinely imprisons and erases millions of own citizens. Works to annex territories that aren't part of China (today). Funds and arms Russia, Iran, Syria...

You seem like the kind of person that selectively applies and practices their morals, depending on whether the story aligns with your agenda.

buyucu 496 days ago

you seem like a supporter of mass murder

astrange 496 days ago

Anthropic says

> To date we have not used any customer or user-submitted data to train our generative models.

https://www.anthropic.com/news/claude-3-5-sonnet

There's an obvious problem with the concept of training on user prompts; how would training on a bunch of questions cause it to know the answers?

lukan 496 days ago

"There's an obvious problem with the concept of training on user prompts; how would training on a bunch of questions cause it to know the answers?"

I imagine by analysing the chat? If the user says thanks in the end, or gives a thumps up, it likely was a useful and correct answer, that could be included in further training. Or at least considered for future training and I cannot imagine them not considering and experimenting with it.

space_fountain 496 days ago

User queries were at least historically useful to train smaller models from larger models. You need to know the kind of questions real people ask to train a model that’s good at answering those questions

CarRamrod 496 days ago

Back when I started using LLMs for writing code I would type out long, gently phrased explanations about why it was wrong, as if I was teaching a pupil, hoping it would help. I'm sure a lot of us did. If they can parse and mine those prompts, they'll have a nice little metacorpus to build on.

Now I just tell it to stop being stupid over and over until it does a good job. I wonder if it would improve the model to keep all of the beratement in the training data.

Edit: Apparently a 'metacorpus' is a swollen nematode ass. My sincerest apologies, bros.

cheshire_cat 496 days ago

Anthropic states that they don't train on the inputs and outputs of their commercial offerings unless you explicitly opt-in: https://privacy.anthropic.com/en/articles/7996868-i-want-to-...

Do you think they're lying or where you speaking about free tier offerings?

WhereIsTheTruth 496 days ago

If they lied about copyright infringements, why wouldn't they lie for data collection too?

anon373839 496 days ago

The bigger question is what ELSE are Anthropic/OpenAI/et al. doing with your data? Training is just one of many ways to exploit users’ data. Some of the other possibilities are truly chilling.