| HN Mirror

Y	Hacker News new \| ask \| show \| jobs


	by genxy 28 days ago
	This is because the users are training the product. They need training data, so they sell inference at the price of power.

3 comments

gruez 28 days ago

>API Services . If you use the API services, we will collect your IP address and the content (text, audio, video, picture) you submit to analyze the relevant instructions based on the model you select and to generate the returned content. Xiaomi will not use the content you provide for model training or any other purposes.

https://privacy.mi.com/XiaomiMiMoPlatform/en_GB/

link

koteelok 28 days ago

Chinese corporation would never lie

link

colechristensen 28 days ago

And what legal recourse do you have if they don't follow those rules?

link

windexh8er 28 days ago

You have no recourse in the US, either. Trust no one is the only path given all of the training data is stolen in the first place.

It will come to light that one or many of the Frontier providers held the data, changed ToS and trained later minimally. But I think they just don't care and will train regardless. None of them abide by any level of ethics that would actually prevent them from leveraging an opportunity.

link

Tiberium 28 days ago

ChatGPT (the setting is shared with Codex) and Claude (shared with Claude Code) also have sharing enabled by default, so why aren't they cheaper?

link

Springtime 28 days ago

There's evidence various third-party models (including Deepseek) used distilling in training, based on models from those leading services. So they have more flexibility with pricing.

link

malnourish 28 days ago

Is that fundamentally any different than what e.g., Meta and OpenAI have done?

Besides, hasn't SCotUS ruled that raw LLM output isn't subject to copyright? So these companies would be breaking a ToS at worst.

link

behnamoh 28 days ago

So? And Anthropic/OpenAI literally stole copyrighted content to train their models.

link

Springtime 28 days ago

The point was that distilling based on others' models for training means they're not spending the same amount on R&D and/or training, giving them headroom in other ways (responding to the parent's point). It wasn't a comment reflecting on copyright/fair use.

link

behnamoh 28 days ago

In the same fashion, Anthropic/OpenAI also reduced their training cost by not purchasing the license to copyrighted work and stealing it instead.

link

koteelok 28 days ago

They are? They give away thousands of dollars via subs.

link

camelmel 28 days ago

Is this training data even valuable? Usually AI data annotators get paid to write LLM responses, but here all they'd be getting is a bunch of user queries.

link

VerTiGo_Etrex 28 days ago

1. Feed the same queries into Claude 2. Train on the Claude responses 3. ??? 4. Profit

This has been the strategy for months now

link