| HN Mirror

Y	Hacker News new \| ask \| show \| jobs

by sillyfluke 661 days ago

>Not entirely sure what you mean - the date I proposed (2026-08-23) is a full two years from now.

My bad, I could have sworn I read 2025-08-23.

>I think training generative AI on private data would be a huge violation and a big deal.

Just to be clear, I think a local LLM user input leak is by itself a big enough deal before getting into using it as training data for a public MS LLM. The former is getting hit by a car, the latter is getting hit by a train depending on how bad a "mixer" the public LLM being trained is.

I would take a $100 bet that has me winning if there is a data leak or shown to accessible by a third party or a case where it has been used as training data by 2026-08-23 provided it's released by Jan 2025.

1 comments

Ukv 660 days ago

I think I'm probably more interested in the concerns of novel/systematic abuse around this feature (like a decision to send these snapshots to OpenAI for training), less so in the scenario where there's no change from Microsoft (so files are still stored encrypted locally on-disk) but in some one-off event (malware, 0-day exploits, choosing to sync to Google Drive) a user's files are exposed in the same way their browser's password DB could have been.

link