Hacker News new | ask | show | jobs
by whartung 1219 days ago
Is there an elegant solution to the garbage collection problem that append only DBs have to deal with?
2 comments

This is most exhaustively talked about in the context of Log-Structured Merge Trees, like LevelDB and RocksDB. They essentially structure the append-only/write-once chunks into tiers of generations (each compaction shifting the still-alive data to a more long-lived tier), reminiscent of generational GC except now shaped like a tree.

https://en.wikipedia.org/wiki/Log-structured_merge-tree

https://en.wikipedia.org/wiki/Tracing_garbage_collection#Gen...

I think what bitcask proposes (to routinely collect datafiles and mark them as stale and GC them later) is quite a simple solution. It would work for a lot of usecases.