Hacker News new | ask | show | jobs
by DenisM 5145 days ago
In fact inserts almost never incur a disk IO, not until the transaction commits. Even then the IO is sequential, in the log file, not random in the mdf file.

The mdf pages will get marked "dirty" at the time of insert, and they will get flushed to disk eventually - either when checkpint time comes, or if memory gets scarce and the dirty pages get evicted.

It might seem that postponing the write until checkpoint/eviction is not meaningful, but it actually serves a purpose - for one, the transaction is able to finish and return confirmation to the user very quickly, and then if the same page is touched twice (or more) between two flushes, all but the first touches are "free".

1 comments

This argument does not hold up to analysis. You can break it just by making your data set larger.
There are three options when it comes to relative RAM and data sizes:

1. All data fits in RAM 2. Active data fits in RAM, but inactive data does not 3. Active data does not fit in RAM

Case #1 is trivial, though still widely used for e.g. web sites.

Case #2 is where the relational databases have their sweet spot. For example you could have a 200Gb database with all customers and history of orders, but only a subset of those customers are placing orders at any one time, and all the new orders are stored close to each other, so only 8Gb of data is hot, and the rest is accessed occasionally. Assuming you have more than 8Gb of RAM, there will be no IO at the time of the insert, there will be some sequential IO in log file at the end of the transaction, and there will be some "random" IO come at checkpoint time. The latter will not be totally random as SQL Server will reorder all delayed writes in the most sequential manner possible to reduce the seeks (hence there is a good reason to space out checkpoints). The log flush is also sometimes shared between several adjacent transactions if they end up committing at the same time, so under load you can get less than one IO per transactions.

Case #3 is the one you refer to. In this case each btree update will push out some other old dirty page to disk, causing IO elsewhere in the database.

Now what you are saying in this comment is that because there exists case #3, there is no case #2. I think you severely underestimate the importance of the case #2.

I recommend a copy of "Inside SQL Server" by Kalen Delaney to learn more about SQL Server.

case 2 is fine, it just doesn't interest me