Hacker News new | ask | show | jobs
by brianwawok 3571 days ago
I don't understand this benchmark at all. It says performance of a 1000 node cluster, but then shows 100k inserts per second in Cassandra. Then later follow up comments say that this test was on a single machine. Without seeing the schema, 100k inserts / sec is reasonable for a single machine. For 1000 machines it would mean there is a pretty massive configuration issue.

If you are going to benchmark a distributed system, you really need to set up more than 1 server.

(Disclaimer - work at Datastax)

2 comments

This confused me, too.

I think what they meant with "1000 nodes" is that the dataset they're using for the benchmark is synthetic monitoring data (where the thing being monitored are servers).

And the way they generated the synthetic data set is by having 1000 imaginative servers produce one sample per second, (i.e. have a script that writes out 1000 * duration_in_sec fake samples -- I believe this is the code that does it https://github.com/influxdata/influxdb-comparisons/tree/mast...)

Makes sense.

Posting 1 node benchmarks of distributed databases seems suboptimal.

Does it ever make sense to use Cassandra on a single node for anything but dev/test?

I am under the impression that Cassandra's performance comes from its distribution capabilities.

It does not make sense to only use 1 node. It's not designed to be a fast 1 node DB.

In fact for most dev I use 3 nodes on my laptop, and most of our "unit" tests are multi-node as well (closer to integration tests by most measures).