|
A bit disappointed by the architecture -- it's a Django stack with MySQL, Redis, RabbitMQ, and Celery -- for what is effectively AlertManager (a single golang binary) with a nicer web frontend + Grafana integration + etc. I'm curious why/if this architecture was chosen. I get that it started as a standalone product (Amixr), but in the current state it is hard to rationalize deploying this next to Grafana in my current containerless setting. |
Helm (https://github.com/grafana/oncall/tree/dev/helm/oncall), docker-composes for hobby and dev environments.
Besides deployment, there are two main priorities for OnCall architecture: 1) It should be as "default" as possible. No fancy tech, no hacking around 2) It should deliver notifications no matter what.
We chose the most "boring" (no offense Django community, that's a great quality for a framework) stack we know well: Django, Rabbit, Celery, MySQL, Redis. It's mature, reliable, and allows us to build a message bus-based pipeline with reliable and predictable migrations.
It's important for such a tool to be based on message bus because it should have no single point of failure. If worker will die, the other will pick up the task and deliver alert. If Slack will go down, you won't loose your data. It will continue delivering to other destinations and will deliver to Slack once it's up.
The architecture you see in the repo was live for 3+ years now. We were able to perform a few hundreds of data migrations without downtimes, had no major downtimes or data loss. So I'm pretty happy with this choice.