Hacker News new | ask | show | jobs
by avinassh 909 days ago
> To sum up, the key takeaway is to prefer sequential access wherever we can.

For spinning disks, this is obvious. But why is sequential access faster on SSDs?

5 comments

TLDR; Garbage Collection.

In an SSD, a write operation can only be done when the page is already erased. However, the unit of read/write operations are a page, while the unit of erase operation is a block. That means for a disk write, a naive implementation needs to read the whole block, erase the block, then write updated data back to the block, which is unacceptable. Furthermore, blocks should wear out uniformly, otherwise, the SSD would lose capacity.

To tackle these problems, SSD introduces Flash Translation Layer (FTL) which helps to build an illusion of random access device. To achieve this, FTL employs an approach very similar to LSM trees. Writes are always written to new, already erased pages, while in the background, garbage collects (GC) outdated data. FTL needs to keep a map from the user’s logical address to physical address on SSD, both in-memory and persistently.

So to answer your question, why are sequential writes are faster than random writes on SSDs? Because the address map table is smaller since new data is consecutive in larger chunks. Garbage Collection is simpler and only metadata needs to be updated. Erasing a block is required anyway.

what about sequential vs random reads?
Predictable prefetching.
In practice the difference is much smaller between random and sequential reads, with the caveat that all SSD reads on some level are block operations, so locally sequential access is what make the big difference.

Though with mmap the OS may do a speculative async readahead, depending on memadvise.

Like sibling said, physical pages are typically much bigger than logical pages in SSDs. Also drives do prediction and sequential access is easy to predict.
Not just SSD, but in RAM also it's faster, for the same reasons (page-based access to caches). Basically you should always use a B-tree whenever you need the functionality of an STL map.
I understand it is faster on RAM, but I am asking why

IOW what makes it faster to access sequentially on SSD

It is faster in RAM, on SSD, on HDD, for the same reason: the reads are done in blocks. Even if you want a single byte, the system (the controller) will read an entire block, typically on the order of 100k bytes. So if your B-tree node is all stored in one block, all reads from that node will be fast.
I'd assume at some level of IO, blocks/pages/whole buffers are being read, as opposed to reading bytes one by one. So the sequental access takes advantage of this I suppose.