|
(disclosure, I am the author of this post) Hi hn_decay, Our use case goes like this: a member database of over a 100M records, and a number of content databases each with tens or hundreds of millions of records (reviews, video, lists, wiki, this, that , the other thing, one content table with over a billion records, where all the content records had a member id. Our primary usage pattern is to grab a set of content records (say 10-200 at a time) with their member information. Putting everything in one database and doing the join there does not scale for us, and severely reduces flexibility. We would need to continue to scale up our hardware to handle the sum of the content sets, and new content sets are being created on a regular basis. By putting these all into different databases you then have the choice (not the necessity) of keeping them on one or more machines. You can put on one machine a bunch of content sets that are relatively small, and put the big ones on their own machine. You can also scale the hardware to individual content sets - infrequently accessed content sets do not have to be on powerful machines, very frequently accessed sets can be scaled on bigger machines. There are downsides, the two-query hit being the least significant, the extra query on a tuned database is on order of 1ms. Even if the hit was larger, I would still live it, scalability != performance Andy |
Having said that, hundreds of millions of records equals a small dataset. I still don't understand when that's held as some sort of edge case when it's easily accommodated on commodity low-end hardware.