| HN Mirror

Y	Hacker News new \| ask \| show \| jobs


	by wanderinghogan 729 days ago
	Why not include this data in their AI training models? Personally, I was irritated after that quiet 'opt-out' via email to prevent your corporate slack from being used in their ai training models change, recently. I guess they can double dip? Have you pay the pennies for data retention and use your corporate communications to train the next things they will sell you?

1 comments

simonw 729 days ago

Because most companies genuinely don't value training on user data in that way.

It just isn't that valuable, even without the huge amount of negative publicity attached to doing that.

The cutting edge AI labs are leaning much more into high quality data (licensed from the Associated Press for example) and synthetic data, which it turns out is a huge part of Claude and Microsoft's Phi series.

Andrej Karpathy said: "The average webpage on the internet is so random and terrible it's not even clear how prior LLMs learn anything at all." - https://twitter.com/karpathy/status/1797313173449764933

link

altdataseller 729 days ago

But conversations in Slack aren’t your average webpage. Minus the channels used for automated messages/memes, a lot of in-depth, quality conversations happen on Slack on a large variety of topics

link

simonw 729 days ago

And it's all full of potentially private details.

Can you imagine the storm of bad publicity that would emerge the first time some company has details of an internal strategy leaked because some chatbot ended up parroting those details back to a competitor?

link