Hacker News new | ask | show | jobs
by dangoodmanUT 453 days ago
And hold on, 600ns can't possibly be right...

A memory copy plus updating what ever internal memory structures you have is definitely going to be over 1us. Even a non-fsync NVMe write is still >=1us, so this is grossy misleading.

1 comments

our p50 is indeed 600ns for write, the way I explained it. I understand that at this point, this can be read as "trust me bro" kind of statement, but I can offer you something. we can have a quick call and I provide you access to a temp server with HPKV installed on it, with access to our test suit and you'll have a chance to run your own tests.

this can be a good learning opportunity for both of us (potentially more for us) :)

if you're interested, please send us an email to support@hpkv.io and we can arrange that

for the time being, have a look at this please: http://hpkv.io/videos/performance_local.webm

this is 1M records, 3M operations on a single node, single thread, recorded in real time (1x).

I understand that without access to the source of test program it's hard to trust, but we can arrange that if you decided to take on that call :)

The question from most of us isn't "did you get that number," it's "what does that number actually mean?" Writes don't need to return any data, so you can sort of set that latency number arbitrarily by changing the meaning of "write done." I can make "redis with 0 write latency" by returning a "write done" immediately after the packet lands, but then the meaning of "write done" is effectively nil.

In every persistent database, that number indicates that an entry was written to a persistent write-ahead log and that the written value will stay around if the machine crashes immediately after the write. Clearly you don't do this because it's impossible to do in 600 ns. For most of the non-persistent databases (eg redis, memcached), write latency is about how long it takes for something to enter the main data structure and become globally readable. Usually, "write done" also means that the key is globally readable with no extra performance cost (ie it was not just dumped into a write-ahead log in memory and then returned).

In a world where you spoke about the product more credulously or where code was open-source, I might accept that this was the case. As it stands, it looks like:

1. This was your "marketing gimmick" number that you are trying to sell (every database that isn't postgres has one).

2. You got it primarily by compromising on the meaning of "write done," and not on the basis of good engineering.

Thank you for your thoughtful critique.

To clarify what our numbers actually mean and address your main question of "what does that number actually mean":

1- The 600ns figure represents precisely what you described - an in-memory "write done" where memory structures are updated and the data becomes globally readable to all processes. This is indeed comparable to what Redis without persistence or memcached provides. Even at this comparable measurement basis (which isn't our marketing gimmick, but the same standard used by in-memory stores), we're still 2-6x faster than Redis depending on access patterns.

For full persistence guarantees, our mean latency increases to 2582ns per record (600ns in-memory operation + 1982ns disk commit) for our benchmark scenario with 1M records and 100-byte values. This represents the complete durability cycle. This needs to be compared with for example Redis with AOF enabled.

2- I agree that the meaning of "write done" requires clear context. We've been focusing on the in-memory performance advantages in our communications without adequately distinguishing between in-memory and persistence guarantees.

We weren't trying to hide the disk persistence number, we simply used "write done" because in our comparison we compared with Redis without persistence. but mentioning the persistence made an understandable confusion. that was bad on our part.

Based on your feedback, we'll update our documentation to provide more precise metrics that clearly separate these operational phases and their respective guarantees.

UPDATE:

clarification on mean disk write measurement:

the mean value is calculated from the total time of flushing the whole write buffer (parallel processing depending on the number of available cpu cores) divided by the number of records. so the total time for processing and writing 1M records as described above was 1982ms which makes the mean write time for each record 1982ns.

> For full persistence guarantees, our mean latency increases to 2582ns per record (600ns in-memory operation + 1982ns disk commit)

By the way, this set of numbers also makes you look stupid, and you should consider redoing those measurements. No disk out there has less than 10 microseconds of write latency, and the ones in the cloud are closer to 50 us. Citing 2 micros here makes your 600 ns number also look 10x too optimistic.

I would suggest taking this whole thread as less of an opportunity to do marketing "damage control" and more of an opportunity to get honest feedback about your engineering and measurement practices. From the outside, they don't look good.

I also see the update in response to this comment, and it puts everything into perspective. You haven't changed the meaning of "write done," you have just been comparing your reciprocal throughput against Redis's latency, and I think you have been confusing those two.

"600 ns" then really means "1.6M QPS of throughput," which is a good number but is well within the capabilities of many similar offerings (including several databases that are truly persistent). It also says nothing about your latency. If you want to say you are 2-6x faster than Redis, you are going to have to compare that number to Redis's throughput.