Hacker News new | ask | show | jobs
by leif 5146 days ago
"and if there's enough data, the relevant b-tree pages will most certainly be hot"

That is not the case if you have indexes. Most secondary indexes are high-entropy (if they weren't, you probably wouldn't be storing them, is an emotional proof), so inserts on larger-than-memory data sets almost always incur a disk I/O, in a b-tree, at least.

Shameless plug: I have a talk about how to deal with this: http://www.youtube.com/watch?v=q6BnG74FZMQ

2 comments

I don't agree that most secondary indexes are high cardinality. Some definitely are, but there are plenty of indexes with low cardinality, simply used for quicker scanning of those relevant values.

First up, in his example we're talking about 10k rows with a size of 35 bytes. At 8050 bytes of available space per page, that gives a fan-out of 8050/35 = 230. 10k/230*8kb = 350kb. At 350kb of inserts (hobt data, he doesn't go into secondary indexes), memory is completely irrelevant - the only force going on here is latency on writing the log to disk.

If we had a huge data set (as in, did not fit in memory, at all) with high cardinality - sure, we'd have a lot of cold leaf level pages. With no further info on his case, I can only assume most of the hobt and secondary indexes will fit in memory. At worst we'll have to read a cold leaf level page into memory to perform the addition in-memory.

As is there's no mention of even a clustered index, causing all of this to be heap inserts which is arguably one of the fastest insert methods there are (barring certain very special cases).

I said most secondary indexes are high entropy.

If you have a large enough data set that you have to do a leaf read per insert (or close to that often), it will kill your throughput.

In fact inserts almost never incur a disk IO, not until the transaction commits. Even then the IO is sequential, in the log file, not random in the mdf file.

The mdf pages will get marked "dirty" at the time of insert, and they will get flushed to disk eventually - either when checkpint time comes, or if memory gets scarce and the dirty pages get evicted.

It might seem that postponing the write until checkpoint/eviction is not meaningful, but it actually serves a purpose - for one, the transaction is able to finish and return confirmation to the user very quickly, and then if the same page is touched twice (or more) between two flushes, all but the first touches are "free".

This argument does not hold up to analysis. You can break it just by making your data set larger.
There are three options when it comes to relative RAM and data sizes:

1. All data fits in RAM 2. Active data fits in RAM, but inactive data does not 3. Active data does not fit in RAM

Case #1 is trivial, though still widely used for e.g. web sites.

Case #2 is where the relational databases have their sweet spot. For example you could have a 200Gb database with all customers and history of orders, but only a subset of those customers are placing orders at any one time, and all the new orders are stored close to each other, so only 8Gb of data is hot, and the rest is accessed occasionally. Assuming you have more than 8Gb of RAM, there will be no IO at the time of the insert, there will be some sequential IO in log file at the end of the transaction, and there will be some "random" IO come at checkpoint time. The latter will not be totally random as SQL Server will reorder all delayed writes in the most sequential manner possible to reduce the seeks (hence there is a good reason to space out checkpoints). The log flush is also sometimes shared between several adjacent transactions if they end up committing at the same time, so under load you can get less than one IO per transactions.

Case #3 is the one you refer to. In this case each btree update will push out some other old dirty page to disk, causing IO elsewhere in the database.

Now what you are saying in this comment is that because there exists case #3, there is no case #2. I think you severely underestimate the importance of the case #2.

I recommend a copy of "Inside SQL Server" by Kalen Delaney to learn more about SQL Server.

case 2 is fine, it just doesn't interest me