|
|
|
|
|
by psadauskas
6241 days ago
|
|
I agree, plus TC has a ton of parameters that can be tweaked, and the defaults are pretty small. The one that has the most pronouced effect is the bucket size, or the "width" of the hash table. The bigger, the less chance of collisions, which means you have to follow a linked-list to find the exact record. He used 11M keys, so a bnum in the range of 40M would be much quicker. I benchmarked TC b+tree on a 1TB db with ~350M keys, and it worked great. I would publish the numbers, but I'm embarrassed that they aren't very rigorous. cdb docs say it has a limit of 4GB, which makes it pretty much worthless for anything I would use it for. |
|
http://www.unixuser.org/~euske/doc/cdbinternals/index.html
The hash algorithm used also only produces a 32-bit key, meaning you'll be limited to 2^32 total records. Again, though, unless your data is of trivial size, that gives you considerably more room to work with than a hard 4GB limit.
Edit: doh! can't use double-asterisk for exponent on HN