|
|
|
|
|
by parliament32
37 days ago
|
|
FWIW we've also tried all sorts of different things, and honestly the very vanilla (prometheus -> central thanos, fluentbit -> central loki, grafana) ends up on top. The resource consumption is surprisingly minimal (for a sense of scale, we run about 200k eps for metrics and 1k eps for logs). For all these solutions, I find myself asking the same question as you.. what problem are you trying to solve? Is there anything actually different about your product other than less stability than the battle-tested stack? |
|
I'm working on a comprehensive benchmark of Traceway performance on different hardware configurations. The most I've tested with was the smallest managed ch instance with 250k traces per sec, handled it without a hiccup (but that's empirical). You can checkout the traceway git, there is an issue I've opened for benchmarking and you can subscribe/comment on it if you're interested. I'm benchmarking across sqlite, self hosted clickhouse and managed clickhouse. I am a huge fan of systematic, realistic and most of all reproducible benchmarks, so I am really excited about the progress on that.
Anyhow, you can checkout traceway and see what it offers, it's aimed at providing SLOs out of the box, session replays, alerting, configurable dashboards and great exception tracking (automatic symbolication) etc...