Hacker News new | ask | show | jobs
by Hei1Fuya 2785 days ago
Copy on write filesystems can probably be optimized for SMR by using TRIM commands to punch holes and rewrite the content sequentially in a new zone. Afaik both zfs and btrfs have plans to do this.

That way they can be useful for more than archival.

4 comments

You probably mean log structured, not copy on write. CoW doesn't help make writes sequential, unlike log structured filesystems.
log-structured is a special-case of CoW; specifically it's CoW where the allocation strategy is sequential blocks.
I was thinking that F2FS might be a good filesystem as a base on which you'd use an object storage abstraction layer (like ceph)...

However since I initially saw the news of these drives a few days ago Samsung also axed some Linux devs, which gives me pause and makes me reconsider the long term viability of this filesystem...

https://en.wikipedia.org/wiki/F2FS

A full blown filesystem is overkill for an object store. You could use something like libzbc ( https://github.com/hgst/libzbc ) to write directly to the SMR drives on the block level.

I believe Ceph now has abstracted the drives away through BlueStore, which simply puts a large RocksDB database on the drive, bypassing most of the functionality a filesystem offers. It should be much easier to make an SMR compatible version of the LSM-tree backend of RocksDB, than writing a full-blown file system.

It's not accurate to say that BlueStore is just a large RocksDB...

RocksDB is one of several possible backends for object maps. There is a lot more to BlueStore than just omaps.

Also, BlueStore was actually designed with SMR drives in mind, however certain components of it are best placed on solid state media.

I assume that the drive firmware remaps all your writes to make them sequential anyway for increased write performance.
For drive-managed SMR drives, yes, but these seem to be host-managed ones. So the filesystem has to be aware of the zones.
> ... the filesystem has to be aware of the zones.

Does that mean the drives simply will not work with an unaware filesystem or that it will work but performance will be poor?

They will not work at all. You have to issue special commands to the drive to be able to overwrite zones.
Still seems random access might be an issue. But would love to see how eg nilfs2 (maybe on top of software raid) benchmarks against zfs on these big drives.