I'm reading over the source with my coffee this morning. I'll pseudo-CR it if I see mistakes (commented on a typo already). Hope you don't mind! I'm interested to see how you are doing this because I'm playing with my own toy key-value store on the week-ends [1].
In [2], wondering why you make GOMAXPROCS=2*Cpu by default?
I experiment with different GOMAXPROCS settings on three machines and noticed that 1CPU does not run tiedot to its full potential, 3CPU seems to be slower than 2*CPU.
With GOMAXPROCS=n*CPU, n is roughly the amount of pre-emptive (vs the built-in cooperative) multitasking that you want going on, with 1 being none. Handled by the OS, of course. Interesting that you noticed a speed up > 1.
I didn't think about that, I'll write that down in my checklist of things to do when testing/benchmarking my projects... like another dimension to take care of when testing. Aside from domain-range, good data, bad data, edge cases... and other parameters - now add to that concurrency scenarios.
Regarding point 2, I made a note here:
https://github.com/HouzuoGuo/tiedot/wiki/Embedded-Usage
I experiment with different GOMAXPROCS settings on three machines and noticed that 1CPU does not run tiedot to its full potential, 3CPU seems to be slower than 2*CPU.