Hacker News new | ask | show | jobs
by jedberg 1830 days ago
This page tells me a lot about SSDs, but it doesn't tell me why I need to know these things. It doesn't really give me any indication about how I should change my behavior if I know that I'll be running on SSD vs spinning disk.

I've always been told, "just treat SSDs like slow, permanent memory".

4 comments

For instance, when reading this sqlite came immediately to my mind and how much a 10000 loop of inserts without begin/commit or some preparing pragmas would wreck a ssd... (forces a full sync between each two inserts)
Not really though, because your kernel would most likely abstract that away and bunch up the writes.
The kernel can't optimize that because sqlite is specifically requesting it to force a write.
Yes but you can configure the kernel to ignore that, and by default it does.

For example, way back in the day, to get more life out of my laptop during college, I configured the kernel to only write to disk once an hour or when the buffer filled up. That effectively meant I was only writing to disk once per hour when I shut down to change classes.

The modern linux kernel doesn't actually write to disk when fsync is called. It buffers the writes in a cache. Also, the SSD itself has a cache.

There are lots of abstractions between SQLite and the disk.

>The modern linux kernel doesn't actually write to disk when fsync is called

Source for this? This seems to be contradicted by the man page for fsync

https://man7.org/linux/man-pages/man2/fdatasync.2.html

       fsync() transfers ("flushes") all modified in-core data of (i.e.,
       modified buffer cache pages for) the file referred to by the file
       descriptor fd to the disk device (or other permanent storage
       device) so that all changed information can be retrieved even if
       the system crashes or is rebooted.  This includes writing through
       or flushing a disk cache if present.  The call blocks until the
       device reports that the transfer has completed.
>I configured the kernel to only write to disk once an hour or when the buffer filled up. That effectively meant I was only writing to disk once per hour when I shut down to change classes.

Sounds great until you get a kernel panic or random shutdown, in which case you potentially get file corruption and/or data loss.

> The modern linux kernel doesn't actually write to disk when fsync is called. It buffers the writes in a cache.

Do you have a reference for this? That would break every ACID database that I'm aware of, including sqlite and postgresql. There has been a lot of work in the last few years to fix data durability issues with fsync (e.g. https://lwn.net/Articles/752063/), so I would be very surprised to hear that fsync is now a no-op.

> you can configure the kernel to ignore that, and by default it does.

> The modern linux kernel doesn't actually write to disk when fsync is called.

This is false.

Almost all open source databases' durability guarantees are based upon fsync (including SQLite, Postgres, MySQL, and so on). fsync will result in the corresponding underlying storage flush commands. You configure Linux to ignore fsync, but this is is not the default, on any Linux distribution I'm aware of. It would not make any sense.

> The modern linux kernel doesn't actually write to disk when fsync is called. It buffers the writes in a cache.

That's not true, you can tell in many ways but one of the easiest is because fsync is quite slow and noisy (on hard drives).

I would be a bit disappointed if the kernel implementation for HDD and SSD is exactly the same.
Fortunately most people aren't running OLTP workloads on client SSDs. That's mostly done on enterprise SSDs that have much higher endurance. That said even on client SSDs you can probably get away with running such workloads as long as you're not doing them 24/7.
More important than the higher rated endurance (and perhaps contributing a bit to that rating) is the fact that the typical enterprise SSD has power loss protection capacitors for its RAM, so it can cache and combine writes in RAM safely.
yes, it's a weak post

it's really about linking to the tutorial and papers it links at the end, which is some thing from 2014

And that was discussed here 6 years ago: https://news.ycombinator.com/item?id=9049630

Indeed. The summary talks about what you need to do to saturate a SSDs read and write bandwidth. I guess the post would find its audience better if the title was "What a programmer should about SSDs when optimizing IO".

I'd be more interested in the trends in SSD behaviour are. It seems SSDs have bigger and bigger DRAM caches and wear ceased to be an issue many years ago, so there's not much payoff in the write side advice of the article.

Actually wear becomes increasingly more important as DRAM caches are removed to save money. And SSDs tend to have less write volume per unit
yeah, article should talk about periodic TRIMming, though this is more an admin advice
I have found trim is not sufficient at least on Windows, we still need to rarely defragment SSDs from what I can tell.

On a Windows server we were having SSD performance issues where sequential reads were often down to 100MB/s, it was kind of confusing but we tried all sorts of ways to copy it with the same result. I eventually tested the drive with a fragmentation tool and it was really high at 80% but most importantly the problem files had so many fragments that they were tending towards 4k IO reads.

What I did was remove all the files to another drive, force trimmed the drive and gave it several hours to sort itself out and then copied them back and performance was restored to 550MB/s as would be expected.

I wrote a quick go program to test sequential read speed of all files across all the drives and I found plenty of files where performance was degraded. This was across a range of SSDs I had, SATA and NVMe from differing vendors. I suspect this is a bigger problem than most people realise, normal use absolutely can get the drive into a bad performing state and trim wont fix it. Very few people expect that the drive will degrade down to its 4K IO speed on a sequential copy but it apparently can.

Don't modern OSes transparently TRIM periodically anyway?
Yes, although you have to set it up manually if you’re using a more bare-bones Linux distribution or something like that.