|
|
|
Ask HN: How are you handling data retention across your stack?
|
|
11 points
by preston-kwei
51 days ago
|
|
For people building SaaS with data across multiple systems (S3, DBs, caches, etc), do you actually have a clean way to manage retention/deletion across all of them? (Especially when each customer has custom policies) Or is it more a mix of lifecycle rules, cron jobs, and manual cleanup? How are you doing this today? I feel like this is a blocker in enterprise deals when selling to regulated industries. |
|
I used to work at a data platform team and built a cleaning service that used tags and object hierarchy trees to find and clean old PII data. Not an easy thing to do as our data analytics bucket had over 7PiB of data.
Overall the architecture was based of 3 components: detector, enforcer, cleaner. Detector sifted through the datalake to find PII datasets(llm based), enforcer tracked down ETL of the datasets in our VCS to set appropriate tags/metada(custom coding agent), finally cleaner used search to find and clean the data based on the metadata.