Hacker News new | ask | show | jobs
by pauldix 3568 days ago
The tests were run on the same hardware, a single server. Bare metal, not VMs. InfluxDB writes the series string with everything. We tried to imitate what you'd need to do to get close to similar functionality doing time series like InfluxDB does in Cassandra.

If you're just going to write a bunch of uint64 keys with float64 values, of course Cassandra will get much faster. It would be trivial to make a time series database that outperforms InfluxDB with those limitations as well.

The point of the comparison is that InfluxDB gives you a ton of functionality out of the box and has great performance.

Again, the point is that if you want to do time series on Cassandra, you're going to write a bunch of the code yourself.

2 comments

> The point of the comparison is that InfluxDB gives you a ton of functionality out of the box and has great performance. [...] if you want to do time series on Cassandra, you're going to write a bunch of the code yourself.

Fair enough. I'm sure InfluxDB is very good/fast at timeseries data (allthough I have to admit to not actually having tried it out so far). Still, if that was your point, consider removing these statements from the blog.

> InfluxDB outperformed Cassandra by 4.5x when it came to data ingestion.

> InfluxDB outperformed Cassandra by delivering 10.8x better compression.

> InfluxDB outperformed Cassandra by delivering up to 168x better query performance.

I think it would help make the point and not put the reader in a defensive position (when the statements are clearly not based on a fair comparison of the two products and will not hold under most conditions). Just my two cents.

Maybe, but we get asked all the time about Cassandra vs. us. Both in terms of feature set and performance. And performance only makes sense for our potential users if we're trying to replicate the features on Cassandra.
Hasn't that work already been done? Cyanite and KairosDB both plug in to the broader Graphite ecosystem (more or less) and use Cassandra as a data store.

Time series data has also been a particular focus in the Cassandra community. DTCS was too complicated, so they came up with the easier and faster TWCS. I don't think this is on you, but I'd love to see a comparison with the latest stable 3.x and a multiple node cluster.

We'll be doing comparisons against Kairos and OpenTSDB in the coming months. We just get asked about Cassandra specifically quite a bit.
If you're testing those, it would be nice if you could test and make a comparison with the cassandra-based Blueflood as well.

https://github.com/rackerlabs/blueflood/wiki

If you want to test Cassandra, please test at least 9 nodes and have someone with Cassandra setup experience configure your cluster.