Hacker News new | ask | show | jobs
by halfmatthalfcat 2143 days ago
Prometheus and Grafana are awesome, use them personally for all my monitoring.

However I’m still trying to nail down my high cardinality/highly unique metrics-like data story. What are people using?

I’ve heard a combination of Cassandra/BigTable and Spark as a potential solution?

7 comments

I found this interesting. My plan is to move from Prom to Victoria.

https://medium.com/@valyala/measuring-vertical-scalability-f...

Just a heads up, this is an old comparison (over 1 year ago) that hasn't been updated since TimescaleDB now supports native compression. (Blog post references TimescaleDB 1.2.2, the product is now on 1.7.2).
Woof, good luck. Not a great product.
Care to elaborate? At least a slight mention of why.
TimescaleDB is a long-term storage option for Prometheus metrics, has no problem with high-cardinality, and now natively supports PromQL (in addition to SQL) [0]

(Disclaimer: I work at Timescale)

[0] https://github.com/timescale/timescale-prometheus

I'm just starting to look into this and have a question. If I can export my metrics directly to TimescaleDB and it supports visualization with Grafana, is there any reason to go through Prometheus?
Good question. The advantage of Prometheus is the ability to scrape from a variety of endpoints (seems like more and more things are exposing the Prometheus format).

There are some who write metrics directly to TimescaleDB, while others prefer going through Prometheus to take advantage of that ecosystem.

Best part: We support both!

I'd be curious to hear if anyone has done serious evaluation of high-cardinality use-cases of Victoriametrics.
I went from an Influx getting crushed to VM running in a container with 1/8th the resources and it works fine, 1.5m active cardinality. Could handle a lot more probably. Auto fill in Grafana breaks but oh well!
Honeycomb https://honeycomb.io/ is laser focused on this stuff. They built their own datastore (similar to Druid but schemaless) so they could create the experience they were aiming for.

They talk a lot about collaborative troubleshooting, and the user interface reflects that. It's actually fun (?!) to drill down from heatmaps to individual events with Honeycomb's little comparison charts lighting the way.

I've used druid.io in the past and it had worked well, but it's a lot of trouble to set up and tune.. Haven't tried it, but clickhouse looks good and has approximate aggregations for high cardinality dimensions.
Druid truly is still king in this space. The setup has become less onerous over time. It handles arbitrarily high cardinality and dimensionality with ease and its support for sketching algorithms leaves other similar systems (especially Prometheus) in the dust.
Spark has worked decently for me if you need to be cloud agnostic.

Currently I’m in AWS land and Athena has been mostly working for what I need but I haven’t really pushed it that hard yet.

Just curious what your numbers are? Unique metrics, cardinality per metric, ingest rate, expected query ranges?
I have an instance that scrapes about 30K targets for 15 million metrics and it works better than you'd expect. The biggest performance issue we have is rendering the targets page.

We have a plan to split it down to less instances per node but it's worked well enough so far.