|
|
|
|
|
by brunoscheufler
699 days ago
|
|
Bruno from Inngest here, thanks for asking! In general, we use OpenTelemetry[1] for instrumenting our services in production, collecting metrics and logs for important events. Specifically, we have set up - multiple dashboards informing us about current system usage (events received, processed) including e2e latency distributions, compute resource usage for different deployments, and top operations - metrics on critical systems (data stores including Redis, messaging infrastructure, connection poolers for Postgres, etc.) to gauge current resource utilization and typical load patterns - alerting on unexpected deviations in KPIs (a subset of the metrics above) to help us spot and react to issues quickly - forecasting on product usage and compute resource utilization patterns for planning medium to long-term infrastructure work Hope this helps! [1]: https://opentelemetry.io/ |
|