Update - Many services recovered automatically, engineering is continuing to identify still affected services and mitigate issues as necessary.
Mar 26, 2024 - 16:28 UTC
Update - We are continuing to work on a fix for this issue.
Mar 26, 2024 - 16:20 UTC
Identified - We are encountering a broad range of outages across the Render Platform affecting connections and services. Engineering is working on mitigating the cause of these issues and narrowing down any non-affected components.
Whenever there's an incident like this, it seems like the status pages take a while to reflect the problem. Why is that? The part of the status page that can be manually updated is now correct, but most of the automated checks still show 100% uptime and only "degraded performance" in some cases despite being fully offline.
If I were implementing a status page, it might look something like a ping to some url from various regions. Assuming that's what these status pages are doing, why do they often say "All systems operational" until well into the downtime? Frustrating that I have to confirm on HN before I can know for sure something isn't just down for me.
(Render CEO) As an engineer myself, I understand your frustration. Updating the status page automatically is a hard problem when the systems in question are distributed and complex. Perhaps we could post a generic 'something is wrong' message using the automated checks we have in place, and go from there.
For me it is the speed of update, I posted an outage on our own status a good 15m before they did on their status. I had to email them before I got an ACK.
(Render CEO) Things are recovering. We're still investigating, but the issue seems unrelated to Cloudflare. We'll post more updates on https://status.render.com.
We continue updating https://status.render.com, and your services should have come back online within the last hour. If not, please email me, and we'll figure it out.
It appears to be all but Ohio that are down now. It's hard to tell from the status page, though, which says "No downtime recorded on this day" for all locations 2h into the incident. I have one Oregon database that has come back online, though, so it looks like things are gradually being restored.
Update - Many services recovered automatically, engineering is continuing to identify still affected services and mitigate issues as necessary.
Mar 26, 2024 - 16:28 UTC
Update - We are continuing to work on a fix for this issue.
Mar 26, 2024 - 16:20 UTC
Identified - We are encountering a broad range of outages across the Render Platform affecting connections and services. Engineering is working on mitigating the cause of these issues and narrowing down any non-affected components.
Mar 26, 2024 - 16:19 UTC