| that's a great question. the 600ns figure represents our optimized write path and not a full fsync operation. we achieve it -among other things- through: 1- as mentioned, we are not using any traditional filesystem and we're bypassing several VFS layers. 2- free space management is a combination of two RB trees, providing O(log n) for slice and O(log n + k) - k being the number of adjacent free spaces for merge. 3- majority of the write path employs a lock free design and where needed we're using per cpu write buffers the transactional guarantees we provide is via: 1- atomic individual operations with retries 2- various conflict resolution strategies (timestamp, etc.) 3- durability through controlled persistence cycles with configurable commit intervals depending on the plan, we provide persistence guarantee between 30 sec to 5 minutes |
A write operation on a SSD takes 10s of uS - without any VFS layers