I wrote last year's benchmark. The clusters are completely different, and so is the workload.
Last year's cluster had 300 VMs, which was a much higher price point, and the workload was write only.
This benchmark uses YCSB workloads A and B, which we though matches the usage we'll have on BigTable. The cluster is much smaller as well.
I shared my scripts from last year, it is pretty easy (although a bit expensive) to repro the numbers. Let me check if we can share this year's benchmark scripts as well.
I'm pretty surprised about the difference in latency though, throughput as you say will be different due to number of nodes.
For any given replication factor in Cassandra, overhead remains the pretty much the same irrespective of whether you have 300 or 3 nodes. So should the latency.
On top of that both BigTable and Cassandra use SSTables to store the data on disk (with all the compactiony goodness that goes with them), so I'm even more surprised that the difference in latency is so huge.
Would love to see the scripts for the benchmarks! I don't want to take away from a great product launch and I'm sure BigTable kicks arse in certain areas that Cassandra doesn't... I'm just surprised at the differences in latency.
Without knowing a lot more about their benchmark environment this go around, these bold statements are just about useless. Let's hope further details follow.
Worst case, people are going to benchmark this independently and hopefully do a better job being transparent.
The gentleman who produced these benchmarks replied directly to this thread. He also has been very open with sharing his scripts and setups, so that you can reproduce it yourself. He encourages it actually!
You must be looking the median latencies. 99% latency was and still > 200ms. You can blame GC jitters for the much bigger variance. They should also show median and 95% latencies for this years number as well.