Hacker News new | ask | show | jobs
by scottlamb 83 days ago
> Many consumer SSDs, especially DRAMless ones (e.g., Apacer AS350 1TB, but also seen on Crucial SSDs), under synchronous writes, will regularly produce latency spikes of 10 seconds or more, due to the way they need to manage their cells.

Is there an experiment you'd recommend to reliably show this behavior on such a SSD (or ideally to become confident a given SSD is unaffected)? Is it as simple as writing flat-out for say, 10 minutes, with O_DIRECT so you can easily measure latency of individual writes? do you need a certain level of concurrency? or a mixed read/write load? etc? repeated writes to a small region vs writes to a large region (or maybe given remapping that doesn't matter)? Is this like a one-liner with `fio`? does it depend on longer-term state such as how much of the SSD's capacity has been written and not TRIMed?

Also, what could one do in advance to know if they're about to purchase such an SSD? You mentioned one affected model. You mentioned DRAMless too, but do consumer SSD spec sheets generally say how much DRAM (if any) the devices have? maybe some known unaffected consumer models? it'd be a shame to jump to enterprise prices to avoid this if that's not necessary.

I have a few consumer SSDs around that I've never really pushed; it'd be interesting to see if they have this behavior.

2 comments

> Also, what could one do in advance to know if they're about to purchase such an SSD? You mentioned one affected model.

Typically QLC is significantly worse at this than TLC, since the "real" write speed is very low. In my experience any QLC is very susceptible to long pauses in write heavy scenarios.

It does depend on controller though. As an example, check out the sustained write benchmark graph here[1], you can see that a number of models starts this oscillating pattern after exhausting the pseudo-SLC buffer, indicating the controller is taking a time-out to rearrange things in the background. Others do it too but more irregularly.

> You mentioned DRAMless too, but do consumer SSD spec sheets generally say how much DRAM (if any) the devices have?

I rely on TechPowerUp, as an example compare the Samsung 970 Evo[2] to 990 Evo[3] under DRAM cache section.

[1]: https://www.tomshardware.com/pc-components/ssds/samsung-990-... (second image in IOMeter graph)

[2]: https://www.techpowerup.com/ssd-specs/samsung-970-evo-1-tb.d...

[3]: https://www.techpowerup.com/ssd-specs/samsung-990-evo-plus-1...

> Is there an experiment you'd recommend to reliably show this behavior on such a SSD?

  fio --name 4k-write --rw=write --bs=4k --size=1G --filename=fio.file --ioengine=libaio --sync=1 --time_based --runtime=60 --write_iops_log=ssd --log_avg_msec=1000 --randrepeat=0 --refill_buffers=1
Then examine ssd_iops.1.log.

Results from Apacer AS350 1TB: https://pastebin.com/F6pr5g29 - the first field is the timestamp in milliseconds, the second one is the write IOs completed since the previous line.

EDIT: I was told that the test above is invalid and that I should add --direct=1. OK, here is the new log, showing the same: https://pastebin.com/Wyw6r9TC - note that some timestamps are completely missing, indicating that the SSD performed zero IOs in that second.

You may want to repeat the experiment a few times.