| HN Mirror

Y	Hacker News new \| ask \| show \| jobs

by Kalium 2024 days ago

> Really? I've found the exact opposite - teams that used an RDBMS had to throw away their customer data under GDPR, because even though they had an entry in their database saying that the customer had agreed, they couldn't tell you what the customer had agreed to or when. Whereas teams using Kafka in the way you describe had an event record for the original agreement, and could tell you where any given piece of data came from.

This is absolutely wonderful! Unfortunately, this team decided to store data subject to GDPR deletion requests in Kafka, where deletion is quite difficult. It was a problem, when trying to do deletion programmatically, across many teams using the same set of topics.

The real nightmare came when this team, obsessed with the power of infinite retention periods, encountered PCI-DSS. You see, the business wanted to move away from Stripe and similar to dealing with a processor directly, in order to save on transaction fees. So obviously they could just put credit card data into Kafka...

1 comments

lmm 2023 days ago

Yeah fair enough. I'd argue that this is kind of a double standard (a traditional RDBMS may well be copies of "deleted" data on dirty pages, and may well leave that data on the physical disk indefinitely, for much the same reasons as Kafka does - it just makes it a bit fiddlier for you to access it), but your legal team may decide that it's required.

I don't think your overall scorn is warranted - there are bigger problems that are endemic to RDBMS deployments, and the advantages of a stream-first architecture are very real - but there are genuine difficulties around handling data obliteration and it's something you need to design for carefully if you're using an immutable-first architecture and have that requirement.