Hacker News new | ask | show | jobs
by justsomehnguy 1493 days ago
> The dogma at the moment is that SSDs don't require de-fragmentation

They don't require a regular de-fragmentation, like HDDs, because if you are just occasionally read some files it would be fast enough AND with a physical layer hidden by remap it doesn't make sense at all, because the file what is logically present to you as a one continuous block could be really stored across multiple locations.[0]

> I know defragging the files (copy away, delete files and replace)

And this is one the real way to "defrag" on SSD backed media. Tossing around clusters like it is a HDD only wastes your TBW.[1]

> While there is no particular reason why the SSD would run slow once you try to read a file from the filesystem it does slow down and it can impact performance dramatically

THere is always a couple of factors what affect the performance.

Where is always a question what exactly you are reading: a bazillion of < 1KB files could be anywhere on the physical storage and while the time to access for a single file could be as fast as SSD can provide, the pattern of accessing thousands of files of small files not only fills the IO queue, but wastes tons of time on overhead, for every file you access there is not only "Hey SSD grab bytes at LBA 44444 to 5555", but also there are a before mentioned queue for IO operations, parsing MFT for the file location at LBA, reading and parsing DACL, allocating handles (and discarding them later) etc, etc. And if you run out of caches (most notably the DRAM cache on your SSD) then of course things starts to slow down to a crawl, especially if you not only reading those files, but do other things on the same drive at the same time.

Also while I mention MFT - some small files are stored in it entirely[2], so all the overhead is processed quickly (because in normal conditions most of the MFT is cached in memory anyway) but it should be small enough[3].

Also don't forget what if your file is 1KB the drive doesn't read 1KB from the storage. At best it reads 4KB (the default NTFS cluster size), but if your next file isn't in this block (or it is but by the time it comes to read it the cache of this block was already flushed) then you need to wait until the previous read completes. Yes, reads are fast, in theory, but again this is where IO queue, caches, NCQ starts do matter.

And last but not least: on Windows there is always a question if the antivirus software (be it built-in Defender or a 3rd-party one) is still sane or wastes your time rechecking all your already checked, static, non-executable files. Like a bazillion of jsons.

[0] and without TRIM support you can't even have even a very loose guarantee what you really cleared the block.

[1] back in the day I used this to defrag a very heavy fragmented HDDs, just Ghost it to another drive and then restore it back - all files are defraged and it takes way less time because source drive only reads, not read-write-repeat.

[2] https://superuser.com/questions/1185461/maximum-size-of-file...

[3] just checked a couple random files on my drive - cutoff is somewhere around ~700B.