|
|
|
|
|
by jpollock
138 days ago
|
|
Measurement and alerting is usually done in business metrics, not the causes. That way you catch classes of problems. Not sure about expected loss, that's a decay rate? But stuck jobs are via tasks being processed and average latency. |
|