|
|
|
|
|
by bArray
2306 days ago
|
|
Surely if you know that the hashes will have an ~even distribution you can quite quickly make some assumptions about roughly where the key will be? I'm not entirely sure I'm sold on the speed gained by splitting files vs doing a simple seek operation to an offset [1]. There's probably a bunch of time lost searching the filesystem through a file/folder structure? Also the simple act of converting the numbers from ASCII to binary should save a bunch of disk space too (and make searching quicker)? Great write-up though, good to see a bunch of solutions tried. [1] http://www.cplusplus.com/reference/cstdio/fseek/ |
|
But an index file of 64-bit offsets could easily be seeked to read the value of the offset based upon the first 2 or even 4 byte offset.
Though with 4 bytes, that becomes a 4 gigabyte index file! But that would probably be much faster as you only do one seek in one file, then another seek to the main file, then search a much shorter distance to the result!
If the system has enough ram, the operating system will cache the files anyway and will be pretty fast i think.
(can you tell my first job involved writing ad-hoc database systems?)