|
|
|
|
|
by lstamour
1435 days ago
|
|
The answer is always “it depends,” but I think if a date time is a UTC timestamp, such as a record of when an event happened, then with random sampling, it shouldn’t matter? It’s just a timestamp. The amount of information it contains might include location, might include timing to other events, could be correlated, but… on its own? It doesn’t need anonymization. Likewise the sequence of events, should be safe to use. I get that you can look up or de-anonymize an event by its timestamp and the same is true of ID numbers. But it’s worse for ID numbers because these are often permanent and re-used for multiple events. But yeah, the risk in anonymized data is that it’s never truly both anonymous and useful. Truly anonymous data might be considered junk or random data. Anonymized data has some utility purpose to fulfil. Perhaps “realistic” analytics is required, or you want to troubleshoot a production issue without revealing who did what to engineers. So you anonymize the fields they shouldn’t see, and create a subset of data that reproduces the issue…? Anonymized data is almost always a bad approach compared to generating data from algorithmic or random sources, but sometimes we need anonymized or restricted data to start that process. |
|
A good example is: https://gretel.ai/blog/gretel-ai-illumina-using-ai-to-create...
Full disclosure, I work at Gretel, but I thought this was relevant enough to mention.