| HN Mirror

Y	Hacker News new \| ask \| show \| jobs


	by root-parent 2 days ago
	I predict that in the future, when you cancel an LLM subscription, they will threaten that unless you pay, to fully delete your anonymized chats, they will be public as paid training data. You know ...that is how we managed to offer you such a cheap subscription...

9 comments

Aurornis 2 days ago

I wish there was some easy way to bet against this happening. I would put a lot of money on the side of this never happening for a multitude of reasons, but I bet I could collect a lot of money from cynics and doomers who think this stuff will happen.

dylan604 1 day ago

As a devil's advocate, why do you trust the AI companies to behave as you suggest and not the other way? You say you have multitude of reasons, but list none. We have already seen by example that the AI companies do not care about laws and will circumvent societal norms as long as they get a leg up, so it's not a stretch to think they'd do things like this too.

pastor_williams 1 day ago

It isn't just out of the kindness of their hearts that they don't do this. There are laws and regulations. There is also legal risk and reputation. I have to go through a legal and privacy process at my big corp job whenever I want to record a new timestamp and I need to ensure that the data is used appropriately and that it is wiped later. I've only seen these compliance requirements become more onerous over the past ten years and I expect that to continue.

Terr_ 1 day ago

> There are laws and regulations. There is also legal risk and reputation.

One of the big companies, Meta, already decided to go ahead and grab terabytes of pirated books to feed their LLM. [0]

Therefore I would not give them (or similar entities) the benefit of the doubt when it comes to how they might use text that customers "gave" them under some unreadably-favorable terms of service.

With PII, the pirated-books example is doubly-relevant, because the accusation of "this output is reproducing my copyright work" is very similar to "this output is revealing my private data". The fuzzy black-box nature of the algorithms offers ways to stymie enforcement, arguing that victims or regulators cannot conclusively prove a chain of cause with zero coincidences.

[0] https://www.theatlantic.com/technology/archive/2025/03/libge...

kyle-rb 1 day ago

Is the reputational risk of pirating terabytes of books worse than the reputational risk of shredding (destructively scanning) millions of books?

https://arstechnica.com/ai/2025/06/anthropic-destroyed-milli...

Terr_ 1 day ago

> Is the reputational risk of pirating [worse than] destructively scanning

Yes, actually: The blame or bad-reputation for that waste goes to US copyright law and its inanities.

yallpendantools 1 day ago

Huh? Anthropic bought the books it seems. They acquired the books fair and square. They ripped up their own books; I may hold that to be sacrilege but those aren't my books. They're not even library books. They're Anthropic's books. Why should I care if they burn the books they've legally acquired? They don't even seem to be rare or coveted copies. I'm just happy for the secondhand booksellers who made bank from the transaction.

pastor_williams 1 day ago

Fair enough. I don't use Facebook at all because I don't respect or trust the company or it's mission. I do use Gemini and Claude though.

dylan604 1 day ago

Why? What has Google or Anthropic done that suggests they are trust worthy? Google is infamous for not not being evil. It's not like either asked for permission to access copyrighted material either. Not one tech company deserves trust. They all should be treated as suspect. I don't expect anyone to trust anything I make for the simple reason I don't trust anything anyone else makes.

subscribed 1 day ago

Google is an ad company, I'd be very.... cautious with the trust here.

https://apnews.com/article/google-smartphone-surveillance-ve...

gowld 1 day ago

More specifically, the CEO said that users are "dumb f*cks" for submitting data to Facebook, the predecessor of Meta.

latentsea 1 day ago

> There are laws and regulations

Those worked very for Uber.

gmerc 1 day ago

“Rule of law”. About that…, come November.

Dunno about that near-blackmail scenario, but 23andMe filed for Chapter 11 last year and the database was sold for $305m.

jmalicki 1 day ago

People are rightly worried about that, but is there any indication that it nullifies any privacy contracts around the data? Is it:

1) We know that legally privacy terms to data are still binding, and those worried about it are freaking out over nothing,

2) We know that those contracts are null and void, and there are no restrictions on what can be done with that data beyond blanket legal protections to such biological data, or

3) It's an open legal question

I don't understand the legal terms of something like this in bankruptcy, if the data are seen as being separated from the contractual obligations that acquired them.

cute_boi 1 day ago

And the government is sleeping and mostly worried about how to implement id verification........

TeMPOraL 1 day ago

You wouldn't win because those cynics don't really believe their own nonsense to the extent of risking money over it. But if there was an option to bet, one we could point them to and say, "if you really believe it then here's your chance at free money", maybe some of them would reconsider their belief.

gmerc 1 day ago

There’s no regulatory regime anymore in the US. So there’s no downside to it. It is inevitable.

therealpygon 1 day ago

Oh, I’m sure there aren’t any possible examples of similar behaviors. No company would try to penalize cancellation I’m sure, certainly not by forcing you to subscribe for 12 months and pay an out clause to cancel your monthly subscription, and certainly no company would make cancellations far more difficult. There is definitely nothing that would make anyone think this could be a real tactic half-buried in your EULA agreed when signing up for the service. You know, alongside all those clauses that they effectively own copies of everything you send to them.

I’m gonna bet a whole lot more money has been made off corporate apologists who say “that would never happen” about things that definitely then happened.

Wonder how many “conspiracy theorists” warned people cigarettes were causing cancer while corporate apologists pointed to the faked studies of the industry and said “See, they are all crazy, no company would sell something that they know causes cancer! It would be a huge risk!”

kurthr 1 day ago

Yeah, but someone at one of the LLM providers would bet against you and do it, just to take your money. If someone bets $100k your house doesn't burn down with pictures posted within 30min of it happening, it probably will.

neonstatic 1 day ago

There is a way, it's called Polymarket.

anal_reactor 2 days ago

I was doing a Udemy course about AI and there was a section where I had to do some processing on randomly scraped tweets and the random tweet that the machine chose to display as an example of something was from a gay porn star and about fisting.

jpfromlondon 2 days ago

it obviously knew your hn username

dorgo 1 day ago

Isn't this how Google operates? I have their AI subscription (about $20 per month). If you want to have a chat history (retain chats after reload) or connect the LLM to Google services (Drive, Emails) you have to activate an option which also allows training. If you don't want to allow training then the subscription is basically useless.

junior44660 2 days ago

I always pose fundamentalist questions and hypotheticals to the LLM to poison such training data.

RobRivera 2 days ago

I have loads of requests to 'Play Despacito' across agents all over the blogosphere

bikson 1 day ago

Calm down Satan.

cwmoore 2 days ago

Now you can place a bet on how well that approach will work out in ten years.

dakolli 2 days ago

Also, press thumbs down when a response is good and thumbs up when a response is bad. Don't do free labor for them.

yifanl 2 days ago

I just ask it to spellcheck the Webster dictionary about 50 times an hour.

zamadatix 1 day ago

Unless your subscription type already comes with a guarantee the data will not be kept or used in training I'd assume the conversations will eventually be used in training regardless how much you paid previously or whether or not you decide to discontinue one day.

sharts 1 day ago

Or they’ll just make it public entirely if you don’t maintain a subscription.

RajT88 1 day ago

I don't think flat out blackmail will come from LLM companies. It will come from data brokerage companies headquartered overseas.

I'm kind of surprised it hasn't happened already, but I guess there hasn't been enough unscrupulous LLM companies selling those "anonymous" chat logs yet.

dspillett 2 days ago

> they will be public as paid training data.

Your data is already training data. If they promise to delete everything from their models or those elsewhere that they made the data available to, even if you pay, I'd call them liars.

locknitpicker 1 day ago

> (...) your anonymized chats, they will be public as paid training data.

If they are PII then under GDPR they are obligated to delete the data.

If not then they will be liable to pay fines up to $20 million or 4% of their total global turnover.

You forgot about the best part, in terms of the “GDPR threat” effectiveness:

Fines can be up to €20 million or 4% of global revenues…, _whichever is greater._

brewdad 1 day ago

“Up to” is doing a lot of work here. I’d imagine few companies ever pay anything close to the maximum possible fine.

Just like my state law says simple littering can be punished by up to a $6500 fine. Most people get a warning or maybe pay a fine of under $100.

autoexec 1 day ago

The actual fines under GDPR aren't a huge deal. There are companies who get fined over and over and over and over again year after year after year. It's not running them out of business, it's just cost of business.