| HN Mirror

Y	Hacker News new \| ask \| show \| jobs


	by bayesian_horse 1525 days ago
	You can anonymize such data or get the necessary agreements for a small subset. All of which is tricky, but not impossible.

1 comments

dragonwriter 1525 days ago

> You can anonymize such data

...thereby destroying the production-like features for which you want it for testing, which you then need to recreate and reintroduce, so you might as well just synthesize test data in the first place, since that's what you end up doing anyway, in effect.

link

bayesian_horse 1524 days ago

I know you probably just want to be annoying, but really, there is a world of difference between completely synthetic and anonymized data. Without talking about specifics, anonymizing is just the operation of making the process of deanonymization a lot of harder. "Hard enough" is usually specified in some form by the regulator. You can identify an individual by their ECG data, for example, it's just really hard...

No, in actual practice you don't scrub the stuff you actually need to test.

link

dragonwriter 1523 days ago

> I know you probably just want to be annoying, but really, there is a world of difference between completely synthetic and anonymized data.

No, I’ve spent ~20 years in healthcare, with this issue as a frequently recurring issue.

> No, in actual practice you don't scrub the stuff you actually need to test.

In actual practice, the stuff you really need to test often overlaps with the stuff minimally required to scrub to legally deanonymize the data. The most common scenario I’ve seen trying to do this is both creating most of the work of generating synthetic data and failing to legally deidentify the source data.

link