Hacker News new | ask | show | jobs
by fizzynut 285 days ago
Ai generated slop. Constantly summarising various parts of the memory hierarchy, graphs with no x axis, bad units, no real world examples, the final conclusion doesn't match the previous 10 summaries.

The big problem is that it misses a lot of nuisance. If actually try to treat an SSD like ram and you randomly read and or write 4 bytes of data that isn't in a ram cache you will get performance measured in the kilobytes per second, so literally 1,000,000 x worse performance. The only way you get good SSD performance is reading or writing large enough sequential chunks.

Generally random read/write for a small number of bytes is similar cost to a large chunk. If you're constantly hammering an SSD for a long time, the performance numbers also tank, and if that happens your application which was already under load can stall in truly horrible ways.

This also ignores write endurance, any data that has a lifetime measured in say minutes should be in ram, otherwise you can kill an SSD pretty quick.

1 comments

SSDs have so many cases of odd behaviour. If you limit to writing drive sector chunks, so 4k, then at some point you will run into erase issues because the flash erase size is considerably larger than the 4k sectors. But you also run into hitting the limits of the memory buffer and the amount of fast SLC as well which limits the long term sustained write speed. There are lots of these barriers you can break through and watch performance drop sharply and its all implemented differently in each model.
Yes, it can be quite brand/technology specific, but chunk sizes of 4/8/16/etc MB usually work much better for SSDs, but the only data I've found to read/write that easily lines up with those chunk sizes are things like video/textures/etc or cache buffers you fill in ram then write out in chunks.
This from exprience or any sources on what's sane to use today? Building a niched DB and "larger-blocks" has been design direction, but how "far" to go has been a nagging question (Also are log-structured things still a benefit?).
You are also going to cause a lot of write amplification with bigger blocks and at some point its also going to limit your performance as well. What really makes this hard is it depends on how filled the drive is, how heavily the drive is utilised and for how much of the day. Time to garbage collect results in different performance to not.

When you start trying to design tools to use SSDs optimally you find its heavily dependent on use patterns making it very hard to do this in a portable way or one that accounts for changes in the business.

This project is not "business" bound, it's a DB abstraction so business concerns are layered outside of it (but it's a worthwhile pursuit since it rethinks some aspects I haven't seen elsewhere in all the years of DB announcements here and elsewhere).

And yes, write amplification is one major concern but the question is that considering how hardware has changed, how does one design to avoid it. Our classic 512byte, 4k,etc block sizes seems long gone and does the systems "magically" hide it or do we end up with unseen write amplification instead?