|
|
|
|
|
by davidhollander
5588 days ago
|
|
Bitcask is an in memory datastore because it keeps a copy of the entire dataset in memory. 4GB of data==4GB of memory>trivial amount of memory used by a file descriptor. >keeps values on disk in log-structured files Of course it does. Log files are the easiest way to make an in memory datastructure persistant. Logs are generally used for periodically dropping fast writes as a backup in case you want to reload the memory datastructure in the future. Read requests are structured and served from the in memory datastructure and do not require a disk access. The fact that you have something using all your RAM that should require 0 disk accesses for reads NOT substantially beating something performing thousands of disk accesses is a FUBAR situation. |
|
Begging your pardon, but I think you may be misunderstanding bitcask. The bitcask keydir is stored in memory, but the values are stored on disk. The keydir is a hash mapping each key to a file ID and the offset/size in that file at which the value is stored. The only time values are stored in memory on is when the kernel's fs cache or readahead buffer provide them.
http://downloads.basho.com/papers/bitcask-intro.pdf
Since a filesystem directory listing is likely held by your OS cache, you should see similar performance between bitcask and files on disk: an in-memory lookup to obtain the inode/offset, and a disk seek+read.
Riak will be substantially slower than directly using bitcask, however, because you may need to talk over the network to as many as N nodes, wait for all their responses, and compute the resultant vclock/metadata, and then serialize it for HTTP (huge TCP overhead) or protocol buffers (relatively fast). If you're running on a single machine, then you may incur additional time for that machine to run what would normally be distributed over three nodes. Without knowing more about your benchmark, however, it's difficult to say.