Hacker News new | ask | show | jobs
by pavon 3021 days ago
If I was responsible for protecting privacy of data, I don't know that I would be comfortable with this method. Anonymization of data is hard, and frequently turns out to be not as anonymous as originally thought. At a high level, this sounds like they are training a ML system on your data, and then using it to generate similar data. What sort of guarantees can be given that the ML system won't simulate your data with too high of fidelity? I've seen too many image generators that output images very close to the data they were trained on. You could compare the two datasets and look for similarities, but you'd have to have good metrics of what sort of similarity was bad and what sort was good, and I could see that being tricky, in both directions.

Although, I suppose that if the data was already anonymized to the best of your ability, and then this was run on top of that, as a additional layer of protection, that might be okay.