This looks pretty neat. One thing that would be really cool would be support for event logging of protobuf messages. Some way to provide squizy a protobuf bin file that describes all of my messages or something and then a way to generically send you a bunch of protos and have your api serialzie and allow queries and notifications based on these messages.
One thing we have at work is something that's like this:
Right now we're turning these protos into json and serializing it into a mongodb for easy queries. This way we can do things like "COUNT(*) GROUPED BY login_failed.username" and find accoutns that are being targeted by bots, for example.
Seems a little harsh! Prometheus is a big project thats been around a long time, and I'm not expert in it, but I don't think it's aspiring to do incident notification or APM, is it? I think it's just metrics. Maybe more constructive would be to provide some specific things that were unclear or confusing to you in the overview, and/or to suggest that they integrate with Prometheus for the things it's already excellent at and avoid reinventing them? Dunno, just my $0.02, but they're both go projects and it looks decent at a glance to me, probably just using material I think?
Anyway, I think open source stuff sometimes needs constructive criticism but should always be appreciated first as a contribution to the ecosystem even if you're not personally planning to use it.
Prometheus has AlertManager which provides a framework for incident notification (we route incidents to Mattermost and PagerDuty, for example; PD ends up being our big incident response tool, which lets us cascade into a variety of "wake the sysadmin up" methods). It doesn't do APM, but it wouldn't be difficult to expose a Prometheus agent for your APM (just like you'd expose metrics for anything else you want to monitor).
I appreciate new tools, but I do think it's fair to ask what it does better than the existing tools. Prometheus' biggest problem is its learning curve, IMO, so there might be some gains to be made there, but after using it, I think the learning curve is a function of its architecture, which is a large part of what makes it so resilient. If it can be improved while maintaining (or improving on) resilience, awesome, but I personally know that I won't sleep well at night if my monitoring service isn't rock-solid.
Prometheus not work with transaction it is just tools for save metric, I think we will have integration for that too, but right now our plan to improve current system.
About dashboard: we fully agree with you, but for that we need some more experience in UI/ux and design. Also we are not so big as, for example, datadog. But we will be glad to improve it and to hear the suggestions.
The problem is, that transactions do cost resources/latency and it is actually not that relevant.
Why? Because you are not building a tracer but a monitoring tool.
Your dashboard is very hard to read because it is basically just tables and your tables do not give you any visual cue. I would highly recommend getting some icons from something like https://fontawesome.com/ and make it visual clear in what screen you are.
If you have any status text, like open/closed etc. give them appropriate color like green and red.
With your rules, it does look like code so why not with a coding theme? Like how githubs markdown makes code different from the rest of the text.
Give buttons appropriate colors.
with your live view, make sure that certain text is aligned vertically;
Give your Memory graphs proper units. 500000000 is not helpful.
I would highly recommend you to take a little bit of time to look to current existing and well working solutions so you can see what makes sense and what doesn't.
I would argue that not using something like Grafana for your frontend, is a big missing feature.
i totally i agree with you... it's very hard to discern what is going on at a glance. i have to say that i think this is more of a demonstration of the creator's talent then an actual product.
One thing we have at work is something that's like this:
Right now we're turning these protos into json and serializing it into a mongodb for easy queries. This way we can do things like "COUNT(*) GROUPED BY login_failed.username" and find accoutns that are being targeted by bots, for example.