Hacker News new | ask | show | jobs
by dspillett 1024 days ago
> unless the AI starts simply repeating my private data in response to questions

That is a concern some have, particularly around CoPilot and the fact it has been trained with much copy-left covered code in public repositories.

They assure us that it is not possible for blocks of code to be regurgitated that would break things like *GPL, but they have yet to explain why, if that assurance is 100% definitely true, they have not included any of their private code in the training set. Surely they consider that their code is of good quality and would be valuable to include in the model.

> if it learns to draw better hands by looking at my holiday photos or whatever, then I don't see the problem

And if it gives an advertising firm working for a product you'd rather not be associated with an image of a family that look _very_ like yours? Again, the same assurance is given as per CoPilot, but again not everyone is assured by the assurance.

And of course it could happen anyway by chance even if your family is not in the training set. I don't not bother to lock my doors because someone with a good lock-pick could get in anyway.

And they are not doing it because of a great communal benefit (well, their individual coders may be, but the company certainly isn't), they are doing it for commercial benefit. I'd prefer they didn't with my data, or if they do I'd like my slice however small thankyouverymuch.*