Hacker News new | ask | show | jobs
by Scriptor 2580 days ago
Haven't log-based file systems been a thing for a while now? https://en.wikipedia.org/wiki/Log-structured_file_system
2 comments

They have, but they had fallen way out of favor for a couple of reasons - write amplification, cost/complexity of segment cleaning, poor performance for some workloads, etc. They've experienced a bit of a resurgence thanks to flash and SMR, though. Personally, I've always thought log-structured filesystems were elegant and wish they'd been more actively developed during the long "winter" during which their COW/write-anywhere cousins stole all the limelight.
Was hoping there are more research attempt at the Log Structured FS. For example,

- Borrow some ideas from generational garbage collection. Young generation in SSD (or mirrored in RAM) with copying GC to get rid of old versions of fast changing data blocks.

- Utilize some deduplication techniques with content based signature.

I think elements of that generationality are the foundations of the Log Structured Merge Trees used by KV Stores like LevelDB and RocksDB. Atleast I think that's the same concept, I'm not well read on filesystems.
Yes, LSMT is a good example of pushing the idea of a hybrid append log and in memory data structure further.

However, LSMT is for relatively smaller data set, i.e. ordered key-value. It has worse write amplification than a simple append log. The level-0 memtable flushed to the write-ahead-log counts as one write. Writing to the level-1 sorted files counts as 2nd write. Merging the sorted files counts as 3rd write. There're 2~3 writes per change.

Also it doesn't offer help to address the frequent update block problem. All versions of a data change are written to disk. A merge is needed to get rid of the old versions.

But it has a number of good implementation ideas that can be borrowed.