Hacker News new | ask | show | jobs
by ploxiln 4020 days ago
Originally TRIM was an un-queued command; all writes had to be flushed, then TRIM executed, then writes could continue. This was bad for performance with automatic on-file-delete trim, so everyone wanted a trim command that could be put in the command queue along with writes. Many new drives have this.

It turns out that Samsung 8XX SSDs advertise they support queued trim but it's buggy. The old TRIM command works fine.

https://lkml.org/lkml/2015/6/10/642

There are in fact lots of "quirks lists" and "blacklists" in the kernel and virtually all computers require some workarounds in the linux kernel for some buggy hardware they have. Pretty amazing when you think about it.

EDIT: another closely related example is macbook pro SSDs and NCQ aka native command queuing. They claim they support it, but on many it's buggy. It gets better though; the linux kernel just starting trying to use such functionality by default relatively recently.

https://bugzilla.kernel.org/show_bug.cgi?id=60731

these sort of things are, as you can see, very confusing and frustrating to track down, identify, and find a general fix for

EDIT2: now that I actually read the kernel bugzilla entry further, it's more recently come to light the actual problem with recent macbook pro SSDs is MSI (efficient type of interrupts)

4 comments

In essence the Linux kernel put on display what is on Windows hidden by proprietary device drivers.
The thing is, almost all hardware accessed through drivers has tons of bugs, at least it's nowhere near as close to "bug-free" as are things like CPUs or DRAMs which cannot hide their bugs behind drivers. The thing that one can hope to work reasonably is a piece of hardware plus an accompanying driver which knows to hide that hardware's issues.

So another way of putting what you said would be "on Linux there's no working driver for that piece of hardware, unlike on Windows where the 'proprietor' went to the trouble of supplying such a driver."

If you think CPUs do not come with a shit-ton of hardware bugs YOU ARE GRAVELY MISTAKEN.

Google up the Intel errata for the i7

The list goes on and on.

I didn't see him thinking that. Just that CPUs do not have as many bugs as other hardware - which I think is quite true. With CPUs a larger portion of bugs are found, and smaller bugs matter because they are not hidden by proprietary drivers.
FWIW, memory has plenty of bugs too. With respect to the original point, these are usually not visible to drivers (unless you count EDAC) because they're handled at the chipset level. However, for certain kinds of systems - especially embedded - that don't have chipsets these issues can become painfully visible. My own exposure to this was at SiCortex, where the memory logic was directly on the same single die as everything else that comprised a node.
Heh, i still recall my early encounters with Linux and reading the bootup messages.

One of them contained a line related to having found a CPU bug and having put a workaround in place.

I am not entirely sure, but i think it may have been the F00F bug.

https://en.wikipedia.org/wiki/Pentium_F00F_bug

Ha. Reminded me of the Pentium Floating Point Bug from the 90s. First (only?) time a CPU bug has been an international press story?

https://en.wikipedia.org/wiki/Pentium_FDIV_bug

that's one of the things drivers are for; to workaround hardware bugs.

Among the challenges faced by the AMCC 3ware RAID HBAs were faulty motherboards.

"But PCI is a standard!" you quite reasonably protest.

Yes, and the US Constitution guarantees us many inalienable rights.

Since you seem to be the higher voted and showing on top, could you update your bit about queued stuff with this

https://news.ycombinator.com/item?id=9724192

I can no longer edit my comment.

I assumed that these drives had the same controller chip and the same firmware base as the consumer samsung SSDs, but with higher quality nand and some firmware tweaks. It's very hard to find technical details about these enterprise drives on the internet (compared to the consumer drives).

I guess the smartctl command proves it, these enterprise samsung SSDs do not have queued trim enabled.

It would make sense for enterprise drives to be more conservative and lag on feature set. But it's very surprising that enterprise drives are corrupted by original un-queued trim, they're supposed to have more validation, and that's a very common feature.

In this case the TRIM command was un-queued, which makes it worse.
It sounds to me like even when it's the fstrim utility, which uses some ioctl() to tell the kernel to trim free regions in a range on a filesystem, the kernel ends up causing the queued trim command to be used if available.

The "blacklist" does not appear to have any constant to blacklist old-style trim, only NCQ_TRIM (and other odd stuff, most notably all NCQ usage).

This makes sense, because if some SSD advertised old-style trim but was corrupted by it, then it would be found and fixed sooner by these vendors, because Windows 7 would exhibit the corruption.

I see the addendum to your post; touche, I guess these drives do indeed lack queued trim, and have some issue with plain old trim. That's rather surprising, to me... I was going to say "especially for an enterprise-grade drive" but I'm not so sure...
"workarounds in the kernel."

Please permit me to violate my NDA:

/* MacWrite needs this */

... in Mac OS System 7.5.2. I honestly don't know whether MacWrite still needed it but that code was there to work around a bug.