Hacker News new | ask | show | jobs
by dpezely 2717 days ago
How does your workload and server spec compare to theirs?

Following links to the OSDI'18 paper [1] gives this overview:

"Setup. In all experiments, Noria and other storage backends run on an Amazon EC2 c5.4xlarge instance with 16 vCPUs; clients run on separate c5.4xlarge instances unless stated otherwise."

That paragraph goes on to explain the nature of the workload.

Their Figure 2 gives example SQL commands.

[1] https://jon.tsp.io/papers/osdi18-noria.pdf

3 comments

He was showing a microbenchmark of docs.rs/evmap and not of Nora. This is a load test on the local machine: https://github.com/jonhoo/rust-evmap/tree/master/benchmark

His read throughput is linear to the number of cores, which means that it must have been run with the same number of cores to threads. Otherwise context switching and sharing of cores would have resulted in sublinear growth (you can get superlinear growth, but it won't look like that). Assuming his numbers are not fake, his github charts indicate a 32+ core machine.

My 2015 benchmarks were on a Azure G4, Xeon E5-2698B v3 @ 2.00GHz (16 core, hyperthreading disabled), 224 GB, Ubuntu 15.04. They show a slight superlinear growth for reads (Caffeine). https://github.com/ben-manes/caffeine/wiki/Benchmarks#server...

Both benchmarks use a Zipf distribution. I believe he said it only supports single-writer semantics, whereas mine is roughly per-entry (per hashbin). So reads should be safe to compare, and we can ignore his slow write rate. His should be faster as I do more work to maintain complex eviction policies, whereas his can reading a pseudo-immutable map with no eviction.

Since he does not perform any cache operations, as it is a concurrent map, we should be comparing against another unbounded concurrent map. That is over 1 billion reads per second on the above hardware. That is why I think something is miserably wrong or I'm misunderstanding something fundamental. Achieve 25M/s is a very disappointing result.

I'm not the author but you should probably ask there and not on HN.

That said:

> Since he does not perform any cache operations, as it is a concurrent map

Why is it a concurrent map? To me it seems to be a regular rust hashtable. Unless you are going to tweak something most of this is going to be the very slow hashing algorithm.

His chart is very wrong: https://github.com/jonhoo/rust-evmap/blob/master/benchmark/r...

Both Zipf (skewed) and Uniform distributions have the same read throughput. That would not occur because of hardware caches and memory effects. A Zipf means that some entries are read much, much more often (80/20 rule, hot/cold cache entries). The hardware caches should benefit from this and that is how you get superlinear effects.

A uniform is the same as a random access, e.g. all eviction policies would degrade to random replacement as there is no recency/frequency/locality benefits. This means if the dataset cannot reside entirely within the CPU cache then we should see a different throughput. Yet, they are the same.

His benchmarks only have 10k elements so maybe it all fits in the cpu cache. If that's true then 25M/s is absolutely awful. Something is wrong.

It is not lock-free. Every time he reads he acquires a lock by calling epochs.lock(). That may not explain the numbers entirely, but this benchmark is completely bogus.
> Every time he reads he acquires a lock by calling epochs.lock().

Where do you see this? I only see it lock on refresh.

The benchmark allocates a new ReadHandle on every get(key) operation, whose constructor performs work under a shared lock. This is a design flaw of the benchmark, not the library.
I'm not sure where you get that from? The benchmark code clones one read handle for each read thread at the beginning, which is then used for the entire benchmark: https://github.com/jonhoo/rust-evmap/blob/d307999c1ad78d10ec...