Hacker News new | ask | show | jobs
by slotrans 619 days ago
This is true, and in principle a good thing, but in the time since Parquet and ORC were created GDPR and CCPA are things that have come to exist. Any format we build in that space, today, needs to support in-place record-level deletion.
3 comments

Yea so the thing you do for this is called "compaction", where you effectively merge the original + edits/deletes into a new immutable file. You then change your table metadata pointer to point at the new compacted file, and delete the old files from S3.

Due to the way S3 and the ilk are structured as globally replicated KV stores, you're not likely to get in-place edits anytime soon, and until the cost structure incentivizes otherwise you're going to continue to see data systems that preference immutable cloud storage.

You can avoid that if you save only per-user encrypted content (expensive, I know). That way you just should have to revoke that key to remove access to the data. Advantage is you cannot forget any old backup etc.
I mean, you can have it you’ve just got to be happy to bear the cost of rewriting the file every time you mutate a row.