Hacker News new | ask | show | jobs
by gregmac 2869 days ago
What sort of volume is this handling? Is the dataset in memory, or if not, what size and type of I/O is backing this? It seems like a ton of CPU, whereas in my experience typically I/O is the primary bottleneck for database loads.
3 comments

Note the second comment in the following thread:

https://news.ycombinator.com/item?id=12739771

quinthar posted in quite a lot of detail there as well, so it may provide some context.

The link says that it requires turning synchroous off, which means that you won't be waiting for real I/O on transactions, since no fsync calls are emitted. Add that to a huge amount of RAM for cache, and it's very reasonable to be CPU bound... So long as you don't mind data loss or corruption on power loss or kernel panic...
Doesn't a BBU Raid controllers largely address the power loss concern? And, I always thought BBU Raid controllers were just absolute common sense for any serious database (until everyone went cloud and suddenly basics like dual network card, dual PSU and raid controllers didn't fit Amazon's desire to sell complexity).
Well, data will be lost if anything below the OS buffer cache crashes (source http://sqlite.1065341.n5.nabble.com/How-dangerous-is-PRAGMA-...).

The BBU protects from power loss of the HDs, but not power loss or general failure of the mainboard or any other important component.

PAlso BBUs can run out of battery so a flash backed BBU is generally recommended.

Without fsync, the dirty page can sit in RAM and not even be sent to the disk, so a battery backup for the disk wouldn't solve data loss.
Right..but you leave fsync on in this case, no? This might be incompatible with the locking required by this server-process edition though (I guess, no clue). But more generally speaking, fsync=on with a proper raid controller gives massive performance boost (like orders of magnitude) while being relatively resilient to power loss.
"in my experience typically I/O is the primary bottleneck for database loads"

Not when you have potentially neatly 3TB of memory cache.

Of course it all depends on the dataset.