|
|
|
|
|
by hruk
808 days ago
|
|
p99.9 referring to latency. However, we also do a weekly test of how quickly we recover from a catastrophic crash, which is roughly about 6 minutes (which is the amount of time it takes for the autoscaling group to spin up a new host, Litestream to restore the database from s3, and the server to start up again). Honestly, 99.9% uptime is pretty generous - we can fit in quite a few catastrophes per year and still have 99.9% uptime. In the 2 years this service has been running, we've had 100% uptime via zero-downtime deployments, anyway. In terms of monitoring, traces and error logs are shipped to our observability solution, yes. |
|