|
|
|
|
|
by CodesInChaos
40 days ago
|
|
Some of the most important ones are: 1. Invoice PDFs. Individually small, but there are a hundred million of them. Deleted after 10 years or when the tenant deletes their account. 2. Reports and exports. Few but potentially big files. If an export logically consists of multiple files, it's stored as zip file. Live 30 days or until the tenant deletes their account. 3. Streaming database exports using AWS Database Migration Service for replication into Snowflake Every file has an entry in the database tracking its storage location and status. Grouping them by tenant, (sub)type or time-interval makes sense for these. But "dataset" isn't an applicable concept. |
|
The post is more about the pipeline / ML / log / export world where ownership isn't enforced by application code.
The DMS case sits somewhere in between - there's a per-table grouping that could be useful, but the files are usually transient enough that it doesn't matter much. Different problem from yours.