Hacker News new | ask | show | jobs
Multiple Digital Ocean services down (status.digitalocean.com)
115 points by inanothertime 207 days ago
10 comments

I use DO's load balancers in a couple of projects, and they don't list Cloudflare as an upstream dependency anywhere that I've seen. It's so frustrating to think you're clear of a service then find out that you're actually in their blast radius too through no fault of your own.
It is mentioned in their list of subprocessors: https://www.digitalocean.com/trust/subprocessors
I find stuff like this all the time, railway.com recently launched an object storage service, but it's simply a wrapper for wasabi buckets under the hood, and they don't mention this anywhere... not even the subprocessors page https://railway.com/legal/subprocessors - customers have no idea they are using wasabi storage buckets unless they dig around the dns records. so i have to do all this research to find upstream dependencies and go subscribe to status.wasabi.com alerts etc.

dig b1.eu-central-1.storage.railway.app +short

s3.eu-central-1.wasabisys.com.

eu-central-1.wasabisys.com.

Hey, I'm the person that was responsible for adding object storage to Railway. It was my onboarding project, basically a project I was able to choose myself and implemented in 3 weeks in my 3rd month after joining Railway.

Object Storage is currently in Priority Boarding, our beta program. We can and will definitely do better, document it and add it to the subprocessor list. I'm really sorry about the current lack of it. There was another important project that I had to do between the beta release of buckets and now. I'm oncall this week, but will continue to bring Buckets to GA next week. So, just to give this context. There's no intentional malevolence or shadiness going on, it's simply because there's 1 engineer (me) working on it, and there's a lot of stuff to prioritize and do.

It's also super important to get user feedback as early as possible. That's why it's a beta release right now, and the beta release is a bit "rushed". The earlier I can get user feedback, the better the GA version will be.

On the "simply a wrapper for wasabi buckets" - yes, we're currently using wasabi under the hood. I can't add physical Object Storage within 3 weeks to all our server locations :D But that's something we'll work towards. I wouldn't say it's "simply" a wrapper, because we're adding substantial value when you use Buckets on Railway: automatic bucket creation for new environments, variable references, credentials as automatic variables, included in your usage limits and alerts, and so on.

I'll do right by you, and by all users.

slight off topic: I used DO LBs for a little while but found myself moving away from that toward a small droplet with haproxy or nginx setup. Worked much better for me personally!
The point of an LB for these projects is to get away from a single point of failure, and I find configuring HA and setting up the networking and everything to be a pain point.

These are all low-traffic projects so it's more cost effective to just throw on the smallest LB than spend the time setting it up myself.

If they are small projects, why are they behind a load balancer to begin with?
Usually because of SSL termination. It's generally "easier" to just let DO manage getting the cert installed. Of course, there are tradeoffs.
I use the LB's for high availability rather than needing load balancing. The LB + 2 web back-ends + Managed DB means a project is resilient to a single server failing, for relatively low devops effort and around $75/mo.
Are both servers deployed from the exact same repo/scripts? Or are they meaningful different, and/or balanced across multiple data centers?

Did your high availability system survive this outage?

Regional LBs do not have Cloudflare as an upstream dependency.
They don't name names but it's probably due to the ongoing Cloudflare explosion. I know the DigitalOcean Spaces CDN is just Cloudflare under the hood.
Just spaces CDN, not spaces - you'd think they'd just turn the CDN off for a bit.
You can't just "turn off CDN" on the modern internet. You'd instantly DDOS your customers' origins. They're not provisioned to handle it, and even if they were the size of the pipe going to them isn't. The modern internet is built around the expectation that everything is distributed via CDN. Some more "traditional" websites would probably be fine.
Might be just me, but I can think of many origins under my control which could live without a (non-functional) CDN for a while.

CDN is great for peak-load, latency reductions, and cost - but not all sites depend on it for scale 24/7

If you are DO you could, you just decided not to bother. They control the origins it's spaces (s3), so they could absolutely spin up further gateways or a cache layer and then turn the CDN off.
Either you are wrong and they do not have the capacity to do that, or they have decided it is acceptable to be down because a major provider is down

I imagine a cache layer cannot be that easy to spin up - otherwise why would they outsource it?

You outsource it because clouflare have more locations than you so offer lower latency and can offer it at a cost that's cheaper or the same price as doing it yourself.
nit: that's more DoS (from a handful of DO LBs) than DDoS.
Yes all sites showing the CloudFlare error due to the massive outage. Seems their outages are getting more frequent and taking down the internet in new ways each time.
Man, it really seems like the cloud providers are having some tough times lately. Azure, AWS, and Cloudflare! Is everything just secretly AWS?
I have two projects on DO using droplets and they are still running fine.
Droplets are fine.

> This incident affects: API, App Platform (Global), Load Balancers (Global), and Spaces (Global).

It seems mostly a CludFlare related issue.

My DOs are working fine as well.

Are you using their "reserved IPs"? I was thinking of starting to use them, but now I wonder if it is part of their load balancing stack under the hood.
So yesterday Azure got hit hard, today CF and DO are down, bad week or something else?
Azure DDoS event happened in October. Blog post about the attack was published yesterday, and was quickly picked up by news sites.
DDOS, but I don't really understand why in particular.
Having known people like this, its either flexing about who has the more powerful botnet or advertising who can do what.
NATO testing internal infra, or Russian hackers stepping it up after aggressive sabotage efforts in Eastern Europe?
I would also like to know people’s opinion on this.
Year-end promotion cycle is the worst time for end-users and the best one for engineers greedy for promotions.
Don't blame individual engineers who want to do what will be rewarded instead of company performance policies that reward this type of behavior.
shoot, there are also end of year layoffs and reorgs to pump up those year end numbers
what engineers, mate? they AI now

and they're doing just spectacular

I knew it, DigitalOcean CDN is using Cloudflare behind the scenes. Why DO ?
Cloudflare outage.
Who is next?
my guesses would be look at who has a FedRAMP capable service first.

maybe also GCP, hetzner, akamai

Dominos falling into dominos falling into dominos…