Hacker News new | ask | show | jobs
by AaronFriel 4607 days ago
The LMDB statistics are very strange - why is synchronous SSD performance worse on most figures than HDD performance? Something seems very wrong with these benchmarks:

    Section 5 (SSD) F (Synchronous Writes)
    
    Random Writes
    
    LevelDB              342 ops/sec	
    Kyoto TreeDB          67 ops/sec	
    SQLite3              114 ops/sec	
    MDB                  148 ops/sec	
    MDB, no MetaSync     322 ops/sec	
    BerkeleyDB           291 ops/sec	
    
    Section 8 (HDD) F (Synchronous Writes)
    
    Random Writes
    
    LevelDB             1291 ops/sec	
    Kyoto TreeDB          28 ops/sec	
    SQLite3              112 ops/sec	
    MDB                  297 ops/sec	
    BerkeleyDB           704 ops/sec	
    
Really? LevelDB is four times faster on an HDD than an SSD with synchronous writes? BerkeleyDB is over twice as fast?

This smells.

5 comments

I would guess the answer is SSD Write Amplification. SSDs in order to write have to erase first. They also try to minimize wear so internally they spread the data around as it gets written. Maybe someone else with more experience can explain more, but that's my guess.
Keep in mind, the HDD was using ext2 and the SSD was using reiserfs. Synchronous writes on ext2 are faster than all journaling filesystems.
Not three orders of magnitude faster, which is the difference between hdd and ssd random writes.
All of the source code is available on that page, you're welcome to rerun it on your own hardware configuration.
Three orders of magnitude faster would mean 1000x faster. You probably meant 3 times faster.
SSDs really are 1000x faster at random writes (~200,000 iops vs ~200 iops)
> The LMDB statistics are very strange - why is synchronous SSD performance worse on most figures than HDD performance?

Could it be that most database engines are based on algorithms that were developed before SSDs were significant, and were extremely optimized for HDD performance?

No. Most database engines are actually so old they were developed even before HDD hit the seek wall.
Paging hyc_symas. Howard Chu isn't shy about talking about benchmark results and he can be found here and on twitter.
Hello hello! Pretty sure what actually happened in these is that the HDD's internal cache was still active, while the Crucial M4 SSD has no internal cache. The only other explanation is that I screwed up my partition offset on the SSD but I already double-checked that and the partitions were all 2MB aligned.
I'm going to compile up LMDB and bench it on a 96GB DL380g8 with quad 3TB ioDrive-2s. Should be interesting to see how various database sizes play out, and what the write amp looks like. I am not seeing much about LMDB's NUMA awareness -- guess I need to keep digging.
For reads we get linear scaling out to 64 cores. Using cache-aligned data structures plays a big part in that for NUMA. (At the moment that's the largest machine we have in our lab.) For writes, there's basically no scaling. Write amplification is logN, proportional to tree height.
If HDD's internal cache was active how are synchronous writes still faster? Shouldn't the flush/sync ensure persistence of data?

If the cache was being used, the HDD results are not actually synchronous, a power loss event would result in data loss.

I'm not sure but I think HDD are better at sequential operations than SSD (which performs better at random operations). Some people say MySQL performs better on a high speed HDD than on a SSD.