This is most exhaustively talked about in the context of Log-Structured Merge Trees, like LevelDB and RocksDB. They essentially structure the append-only/write-once chunks into tiers of generations (each compaction shifting the still-alive data to a more long-lived tier), reminiscent of generational GC except now shaped like a tree.
I think what bitcask proposes (to routinely collect datafiles and mark them as stale and GC them later) is quite a simple solution. It would work for a lot of usecases.
https://en.wikipedia.org/wiki/Log-structured_merge-tree
https://en.wikipedia.org/wiki/Tracing_garbage_collection#Gen...