| HN Mirror

Y	Hacker News new \| ask \| show \| jobs

by bbromhead 4057 days ago

So their benchmark of Cassandra against BigTable doesn't even match their previous benchmark of Cassandra.

http://googlecloudplatform.blogspot.com/2014/03/cassandra-hi...

How did the latency for Cassandra on their cloud platform increase by 200ms from a year ago?

3 comments

ivansmf 4057 days ago

I wrote last year's benchmark. The clusters are completely different, and so is the workload. Last year's cluster had 300 VMs, which was a much higher price point, and the workload was write only. This benchmark uses YCSB workloads A and B, which we though matches the usage we'll have on BigTable. The cluster is much smaller as well. I shared my scripts from last year, it is pretty easy (although a bit expensive) to repro the numbers. Let me check if we can share this year's benchmark scripts as well.

link

bbromhead 4057 days ago

I'm pretty surprised about the difference in latency though, throughput as you say will be different due to number of nodes.

For any given replication factor in Cassandra, overhead remains the pretty much the same irrespective of whether you have 300 or 3 nodes. So should the latency.

On top of that both BigTable and Cassandra use SSTables to store the data on disk (with all the compactiony goodness that goes with them), so I'm even more surprised that the difference in latency is so huge.

Would love to see the scripts for the benchmarks! I don't want to take away from a great product launch and I'm sure BigTable kicks arse in certain areas that Cassandra doesn't... I'm just surprised at the differences in latency.

link

gtaylor 4057 days ago

Without knowing a lot more about their benchmark environment this go around, these bold statements are just about useless. Let's hope further details follow.

Worst case, people are going to benchmark this independently and hopefully do a better job being transparent.

link

vgt 4057 days ago

The gentleman who produced these benchmarks replied directly to this thread. He also has been very open with sharing his scripts and setups, so that you can reproduce it yourself. He encourages it actually!

link

gtaylor 4056 days ago

It doesn't look like he actually shared the scripts for this year's benchmarks, unless I am missing something.

That's what I'd be looking for, not so much some basics on the clusters and the workload.

link

gtaylor 4057 days ago

I may have missed something obvious, but can you link the reply? I'm having difficulty finding it with all of the other comments in here.

link

crb 4056 days ago

https://news.ycombinator.com/item?id=9500512

link

vicaya 4056 days ago

You must be looking the median latencies. 99% latency was and still > 200ms. You can blame GC jitters for the much bigger variance. They should also show median and 95% latencies for this years number as well.

link