| Most SLO tools treat a user journey as binary: either all services are up, or the whole thing is down. That breaks when traffic doesn't flow uniformly through all your services. The checkout SLO that lied
Three services: checkout-base (99.9%), payments (99.95%), coupon (99.5%). A naive AND composition gives a system SLO of ~99.35%. But 90% of users never hit the coupon service. Only 10% go through base → coupon → payments. The coupon service drags the number down, but it only affects a tenth of my traffic. The correct formula is: e_total = 1 - (
0.9 × (1 - e_base) × (1 - e_payments)
+
0.1 × (1 - e_base) × (1 - e_coupon) × (1 - e_payments)
)
Each route is a chain where all services must succeed (multiply success rates). Weights represent traffic share and must sum to 1. Translating this into PromQL
PromQL has no native "product of a set" operator. What it has is scalar(), which collapses a single-element vector into a scalar — exactly what you need when each slok:sli_error_rate recording rule returns one value. The generated rule for a 5m window: 1 - (
0.9 * (
(1 - scalar(slok:sli_error_rate:5m{slo_name="checkout-base-slo",...}))
* (1 - scalar(slok:sli_error_rate:5m{slo_name="payments-slo",...}))
)
+ 0.1 * (
(1 - scalar(slok:sli_error_rate:5m{slo_name="checkout-base-slo",...}))
* (1 - scalar(slok:sli_error_rate:5m{slo_name="coupon-slo",...}))
* (1 - scalar(slok:sli_error_rate:5m{slo_name="payments-slo",...}))
)
)
scalar() is load-bearing. Without it you'd be multiplying labeled vectors with different label sets — PromQL would try to join them and fail. The rule is generated for each evaluation window (5m, 1h, 6h, 3d, 7d, 30d) and stored as slok:sli_error_composition_rate:WINDOW. Everything downstream — burn rate, alerts, status — consumes this single metric without knowing how it was produced. The YAML interface kind: SLOComposition
spec:
target: 99.9
window: 30d
objectives:
- name: base
ref: { name: checkout-base-slo }
- name: payments
ref: { name: payments-slo }
- name: coupon
ref: { name: coupon-slo }
composition:
type: WEIGHTED_ROUTES
params:
routes:
- name: no-coupon
weight: 0.9
chain: [base, payments]
- name: with-coupon
weight: 0.1
chain: [base, coupon, payments]
With the composed error rate as a recording rule, the standard multi-window burn rate pipeline works unchanged. The alert fires when the composed journey is burning budget too fast — not when any single service degrades, but when the degradation actually impacts users at the rate the weights predict. Limitations
scalar() assumes each input recording rule returns exactly one series. If a query matches multiple series, scalar() returns NaN and the composition breaks silently. Also: duplicate alias detection in routes isn't enforced yet by the webhook. This is alpha. Feedback on the API shape welcome. Repo: https://github.com/federicolepera/slok |