| HN Mirror

Y	Hacker News new \| ask \| show \| jobs

by dusanstanojevic 34 days ago

Hi, I'm the creator of Traceway.

I have created Traceway because I looked at that stack and decided that I'm not going to add 7 more services to my stack that could all fail that I now have to maintain as well. Here is the list: Grafana, Otel Collector (to forward metrics), Prometheus, Loki, Tempo, Mimir, K8s.

This is not maintainable in production, unless you have a person to manage it. My app had about 500-1000 req/sec, this sounds like a lot but it's extremely light from the observability perspective. Why would I add 7 more points of failure and services to monitor for proper resource allocation for something like this? To add insult to injury I would have to keep building my SLOs, they wouldn't be tracked automatically by default, I would have to keep paying for Sentry because the issue tracking is quite lacking on Grafana. Oh almost forgot, I would also have to get an alerting provider or pay for that (maybe I'm wrong, it was 6 mo ago).

Anyhow, Traceway is a 60mb binary in Go, it works with Clickhouse or Sqlite and the data is stored on S3 when not used. That means you can host it with sqlite on the 2$ server or even free tier and have it working for your side projects, you can host it with managed clickhouse and get auto scalability on the db level.

The goal is to provide full observability and tools to fix issues directly for developers. What we have so far: alerts, notifications, SSO (google & github), integrations, metrics, preconfigured SLOs, distributed tracing, RUM/session recordings (js & flutter).

Almost forgot, you'd need a symbolicator as well, or your fe/mobile exception stack traces will be messed up in Grafana, I don't even know which tool they have for that, but it's always a new service to host and maintain...