Hacker News new | ask | show | jobs
by hangonhn 1966 days ago
Damn. That's one hell of a set of credentials for the founders.

I was the engineer who was heavily involved with monitoring at my last job and a lot of what this is doing aligns with what I would have done myself. At my new job, I work on different stuff but I can see we're going to run into monitoring issues soon too. I'm so, so, so glad this is an option because I do not want to rebuild that stuff all over again. Getting monitoring scalable and robust is HARD!

1 comments

Hey, thank you. :-) That’s kind of how we feel -- it seems like everyone is building tooling around Prometheus, and frankly, we hope that collective effort can hopefully be redirected to more impactful value creation for our industry. On a personal note, most of us on the team have been there in one way or another, struggling to actually monitor our own work. We’ve had surprise Datadog bills and felt the pain of scaling Prometheus. (In fact, I’m planning a blog post about this struggle, so stay tuned.) It feels like this problem should already be solved, but it’s not. So we’re trying to fix it.
Prometheus is great, the main problem is the bloat of metrics it's collecting. one really needs to carefully define the rules to scrape, compute, reduce and filter the ones that are not needed and the ones that need to precompute.
You’re absolutely right.

As mentioned earlier (https://news.ycombinator.com/item?id=25993825), our goal is to be super transparent; we want you to fully understand what you’re spending on infrastructure. We feel good that there’s an incentive to help you work through the problems that you’ve mentioned.

Attributing collection and querying is made easier with authentication enabled by default. You can make your tenants as fine- or coarse-grained as you want, handing out authentication tokens to the producers writing to those tenants. This makes it easier to trace back to sources of bloat. You can also place rate limits on individual tenants to prevent bloat in the first place.

Additionally, we think users might reconsider the premise of the problem. Because the cost of running Opstrace follows cloud economics (because it runs in your own cloud account), it's basically as cheap as it can possibly be. So you might consider that you do not have as much pressure to curate what is stored as you think. (I didn’t say "no" pressure, but "less" might be a huge improvement. :-) )