Hacker News new | ask | show | jobs
by jalev 8 days ago
This is what I'm wondering too. We've signed a confidentiality agreement with all the big players (as I'm sure all other companies have done), which is supposed to ensure our data is both segregated and not used for training. I don't trust these companies not to do just that; their business is in taking what we have and training their models.
2 comments

Yeah, I always wonder if they do some type of obfuscation and transformation on the private data and find a way to backdoor the info without technically using it directly.
I wonder if there's a way to include data that's so unique you can prove it was trained on and sue later
Unique data like that is unlikely to have any impact on the learned/final weights after training. SGD, Adam and the other hillclimbing solvers abhor jagged edges from "novel" trade secrets and the like. Unless it turns out everyone had the same secret genius idea (and it became a pattern to learn).