Hacker News new | ask | show | jobs
by nemothekid 3119 days ago
>What, none of these people /ever/ foresaw the need to delete some data?

It's a performance trade off, and not a very surprising one. Hard Disk Drives have always been known to never actually delete data (if you want the data gone, you overwrite it with 0s). It's not unimaginable that this performance trade-off found its way up the stack.

And just like a regular HDD, you can forcibly delete the data, it's just a very expensive operation that isn't needed 95% of the time.

1 comments

>And just like a regular HDD, you can forcibly delete the data

Except apparently not, because the linked article is literally saying it's not supported.

I get wanting an audit trail, and I get wanting to not delete data if you don't have to for performance reasons, but neither of those things is the same as saying "it's literally not possible to delete stuff".

> Except apparently not, because the linked article is literally saying it's not supported.

Not entirely true. Kafka, out of the box (and as far as I know, I'm no expert) will keep the records for 7 days and delete them afterwards.

Most people I know (including myself) use Kafka to keep records for longer and a good option is to use what the article describes, which is to compact the logs. In that case the log, after a configured period of time (or when it reaches determined size) gets compacted and only the latest message with an id gets saved, all previous messages with the same id get deleted (that's why the process needs a message with a null velue to perform the "delete").

Only in the case when you want to keep the data forever and can't use compaction (compaction assumes that your messages always contain the full state of an entity, so the last message will always contain the current state and the previous can be deleted with no side effects), then there's no way to delete a specific message. I'd have to read the exceptions for backups included in GDPR, but you could make the case that, in this case, the Kafka log is maintained only as a backup of the data, to be able to replay it again in case something downstream gets broken.

>Except apparently not, because the linked article is literally saying it's not supported.

I see your point, but you are mistaken (or you took the wrong impression from the article) - it is supported, you just wouldn't want to do it day-to-day, and for the context of the article it might as well not exist. In Kafka, if you want to forcibly delete the data, you could simply just force topic compaction after a delete. Depending on the size of your data, a regular delete could take hours, which would likely blow the resource usage on any decently sized deployment.

I bring this up because a lot of shiny "BigData" databases use Log Structured Merge Trees, which are immutable and deletes are mostly "soft-deletes" until a "compaction".

What is your thought on this?

> Encrypt with a user specific key when the data enters the log. You can effectively delete all the user specific data by throwing the key away. No tracking down files or reprocessing necessary.

from https://news.ycombinator.com/item?id=15847674

From a technical perspective, that seems like a sensible compromise to me.