Hacker News new | ask | show | jobs
by soundoflight 3564 days ago
From using InfluxDB (up to v0.10 I think it was), it's a great database but performance REALLY depends on the cardinality of your data.

I can't stress it enough, calculate your cardinality before switching over to it. If your cardinality looks good, InfluxDB is a perfect, logical choice. I really enjoyed it and it is dirt simple to figure out. We had a junior dev just out of college with little experience set it up and get a high level of proficiency in a matter of hours.

Edit: I should point out, I was doing about 10 million records on my db (hosted on a Mac Mini in development!) a day with a 2 week sliding window. I was pushing the data from InfluxDB into custom D3 visualizations. I would cache certain queries in Redis, so I wasn't always hitting InfluxDB with each read request.

1 comments

We're working on the cardinality problem. Will be resolved in an upcoming release. Moving the index over to a disk based format that will hopefully still be fast and not sacrifice lookup performance.
Can you explain the cardinality problem in a bit more detail? Its come up more than once in this thread.
https://docs.influxdata.com/influxdb/v1.0/concepts/glossary/...

You want to keep the amount of different data that you are indexing/tagging on low. As an example with my situation, I was tracking what could be amounted to connections between nodes in a very large tree. I had a lot of distinct pairs, which means that I had a high cardinality. When the cardinality increases a query that used to take a millisecond to load could move to a couple seconds.

So InfluxDB v1.0 has issues with the cardinality of the "primary key" (or candidate keys) gets high?

At what level of keys or tags did you start to see query performance become problematic?

Good to hear! I have a project coming up soon that I want to use it on.