|
|
|
|
|
by aphyr
5588 days ago
|
|
Bitcask is an in memory datastore because it keeps a copy of the entire dataset in memory. Begging your pardon, but I think you may be misunderstanding bitcask. The bitcask keydir is stored in memory, but the values are stored on disk. The keydir is a hash mapping each key to a file ID and the offset/size in that file at which the value is stored. The only time values are stored in memory on is when the kernel's fs cache or readahead buffer provide them. http://downloads.basho.com/papers/bitcask-intro.pdf Since a filesystem directory listing is likely held by your OS cache, you should see similar performance between bitcask and files on disk: an in-memory lookup to obtain the inode/offset, and a disk seek+read. Riak will be substantially slower than directly using bitcask, however, because you may need to talk over the network to as many as N nodes, wait for all their responses, and compute the resultant vclock/metadata, and then serialize it for HTTP (huge TCP overhead) or protocol buffers (relatively fast). If you're running on a single machine, then you may incur additional time for that machine to run what would normally be distributed over three nodes. Without knowing more about your benchmark, however, it's difficult to say. |
|
Yes I was, I apologize. It must have been MongoDB that required total data be no larger than amount of RAM.
The test was run using protocol buffers and Python client on the same computer testing reads vs. a naive map reduce. The naive map reduce was storing 4,000 files in a folder, treating the filename as the key and parsing the text file contents from JSON into a python dictionary to see if an attribute matches. Basically I figure doing a loop of thousands of blocking disk accesses from a laptop harddrive on a standard filesystem buffering nothing in memory should always be much slower than any database.
This was last year so maybe Riak's performance has increased since then. I'd be interested if TokyoCabinet was added as a backend.