Hacker News new | ask | show | jobs
by Sphax 4040 days ago
There's a log compaction cleanup policy yes. Never used it myself but if I'm not mistaken it works like this: for each message you send to Kafka, you set a key with it. When Kafka does log compaction, it keeps only the last value for each key.

The other cleanup policy is to just have a retention time. After X minutes/days/weeks segments of the log are simply deleted.

1 comments

That sounds great if your messages in the logs are the complete state for that key, but I'm not seeing how to use that compaction system if the messages are change events.

Is there a system designed for snapshotting the aggregate and logging the delta?

A common pattern is to publish a "checkpoint" message. Not sure if the concept is built into Kafka or not.
It's easy to store messages in HDFS or S3 for long-term storage. It's also easy to replay messages from those mediums, if you need to re-ingest data later on.