Hacker News new | ask | show | jobs
by ting0 34 days ago
Do you think Prometheus + Grafana is the way to go?
1 comments

Really depends on the use case. Home lab? Probably.

Production? As soon as you scale you need a proper solution. Prometheus (by itself) doesn't scale - you need Mimir or Thanos (or similar).

Clickhouse (the "clickstack") seems to be the new kid on the block. Looks very promising.

Note Clickhouse is quite old (2010ish?) but they've always been a "web server access log analytics" solution. The pivot to "we do observability too" is new, we'll see how that plays out. Not terribly optimistic given how badly a similar pivot went for Elastic, but who knows.
Clickhouse is just a database, it has a really neat feature that infrequently accessed data is pushed back to S3 minimizing the costs. It also heavily compresses the data when storing it.

I am the creator of Traceway and it's my all time fav database. Having said that the repositories in Traceway are completely modular, I've implemented the sqlite version so that I can skip docker containers locally and to simplify self hosting for side projects (it runs on like 2$ servers without issues). This is why it's uniquely suitable for telemetry data and why I've used it as the base of Traceway.

They've acquired HyperDX because it was a major Clickhouse user because their whole platform was telemetry on top of Clickhouse. I hope they don't fully pivot into the space as it would be quite awkward, but there are alternatives and I can always redo repositories with a diff storage engine/db.

I thought observability was shoved on Clickhouse by other stacks deciding to use Clickhouse as their recommended database for observability (SigNoz springs to mind but they were not the only one)
VictoriaMetrics CTO here.

We at VictoriaMetrics took another approach. We tried using ClickHouse as a database for metrics in 2017, but then decided implementing a specialized database for metrics. This database uses ClickHouse architecture ideas for achieving the best performance and the lowest resource usage. The main difference between ClickHouse and VictoriaMetrics is that VictoriaMetrics is optimized solely for typical observability tasks. It supports all the popular data ingestion protocols, it provides promql-compatible querying API, it provides Graphite-compatible querying API, it provides Prometheus-compatible service discovery and relabeling, it provides Prometheus-compatible alerting and recording rules. It provides built-in web UI for quick exploration and analysis of the ingested metrics, with the ability to investigate the source of high cardinality. It consists of a single small executable (~20MB) without external dependencies with minimum configs and minimum maintenance. See https://altinity.com/wp-content/uploads/2021/11/How-ClickHou... for more details.

We used the same approach for building VictoriaLogs - a specialized database for logs. It uses the most appropriate architecture ideas from ClickHouse for achieving high performance and low resource usage. It is schemaless and zero-config. It contains of a single small executable without external dependencies. It accepts logs via popular data ingestion protocols. It provides a specialized query language for typical queries over production logs - LogsQL. This language is much simpler to use than SQL for querying typical logs. It provides a built-in web UI for quick exploration of the ingested logs. It provides a Grafana plugin for building arbitrary complex dashboards from the stored logs. It provides the ability to build alerts and metrics from the stored logs. See https://docs.victoriametrics.com/victorialogs/faq/#what-is-t...

This is refreshing to see, an actual comment about VictoriaMetrics that explains why and when it's good. Thank you!

I am the creator of Traceway, after reading that I think we're in adjacent vertices.

Traceway is aimed at providing teams with no dedicated SREs with preconfigured SLOs, a preconfigured dashboard and a really powerful exception tracking. It integrates with git to automatically open issues as well as slack and others. The idea is to get the experience of Datadog/Sentry in the open source space.

I've focused a lot on session replays, RUM and now symbolication for native, flutter and frontend applications - this might be a potential place where VictoriaMetrics could benefit from integrating with Traceway.

If any of that sounds potentially interesting let me know. Again, thank you for your comment.

I mean, the idea of using OTEL with ClickHouse is rather new, and solves the most painful part of metrics: high cardinality. Has its use-cases, but for sure comes with its own problems
We're on AWS Managed Prometheus + Grafana in production and it certainly scales just fine, although I'm sure under the hood it's an entirely different beast than FOSS Prometheus, likely only AWS engineers truly know..