I think this status page is inaccurate - hosting is affected. My app is _unfixbly_ broken right now. I have an app on fly whose VM appears to have died due to these issues, and because deploys and restarts are broken, I have literally no way of fixing it. https://community.fly.io/t/ewr-app-is-completely-inaccessibl...
This is what is worrying me about moving over to Fly. I am surprised that it has been so heavily pushed here on HN. Perhaps this is just a relatively isolated event, we will see how it is handled moving forward.
This didn't actually kill VMs, but it _did_ prevent them from being rescheduled for upwards of an hour. The vast majority of apps running on the platform had 100% uptime throughout the incident. The ones that didn't rely on our rescheduling infrastructure to recover from app errors.
Except my app isn't down due to an app error but a failed host in EWR which I couldn't escape from (due to the concurrent scheduling issues) https://status.flyio.net/incidents/v2dshzvy1mcl
EDIT: recognize that these may be poorly timed but unrelated incidents, but it has been frustrating to be trapped on a broken box for 12 hours and have the status page telling me it's just new deploys that are borked :)
I don't want to belabor this because we need to do a much better job making it obvious: but single node, development postgres databases are going to have downtime in our infrastructure. We'll get that host back for you, but you should _definitely_ add a replica if you care about availability.