Hacker News new | ask | show | jobs
by sergiosgc 2567 days ago
> That’s a really fair and reasonable response. Not sure what else people really expect here.

The root cause of suspension is incomprehensible to me. They were suspended because they launched a set of instances and these were using 100% CPU. How is that unreasonable and cause for suspension?

I'm not a Digital Ocean customer, but if I were, I'd expect to be able to use the resources I bought without risk of being suspended. This is the root cause. It was compounded by incompetent customer support, but I really do not understand the suspension cause.

The response tackles all secondary factors, but does not talk about the root cause. I'd expect it to.

3 comments

Agreed. They say in the postmortem they it was protection against crypto mining, but what kind of weird reason is that?! If I want to pay for 10 instances mining crypto, why the hell wouldn't I be allowed to do that? I don't see why they should block any workload as long as the credit card details are valid.
It's a customer protection method. Most cryptominers are not using accounts they pay for. They compromise customer accounts and spin up resources. If you aren't proactive about communicating this to customers or blocking it, it can be quite some time before the customer notices and almost all customers will request a refund - even when the attack is a compromised password / successful phish on the customer's side.

Additionally, all cloud providers operate on various models of over-subscription. It is not in anyone's (customer / provider) interest to allow the full consumption of resources when the activity is fraudulent.

As you can see in the post-mortem, they are fine with the usage. They have a process and flag to allow legitimate customers to use their resources. However, based on previous experience at another cloud provider, I would bet that over 90% of those automated hits are correct.

This was bad support. They know that and they seem to be making the right moves to fix it. Fraud is bad for everyone and has to be combated. Not doing so can raise prices and kill a business like DO. I'm sure they feel awful that a customer was so poorly impacted, but the error wasn't in the first ban, it was everything after that.

Part of the whole issue here revolves around shared hosting in my opinion. Host hardware is so oversold that one customer utilizing 100% CPU is so impactful to a handful of other customers that it's not allowed at all. I have seen providers that has terminated services for less than 100% CPU usage, a constant 90% is enough on some of them. But due to the profit margins and shared hosting, providers are able to charge incredibly low prices per instance and be able to oversell their hardware sometimes as much as 10 to 20 times. That's as many as four hundred customers on a box that should maybe have 20 if it weren't oversold it all. In this case it really is an instance of you get what you pay for. The service we provide is no oversold hardware and all dedicated plans. Some people are initially very turned off by the pricing but the ability to allow customers to mine if they wanted to and not affect a single other customer on the platform giving each customer the same experience regardless of any other one images resource utilization, leaves too much happier customers even if smaller profit margins for us. At the end of the day customer experience and support provided are two of the most important factors in running a hosting provider. While I disagree with aspects of digital oceans business model as a shared hosting provider, I do think that the response to this was more than appropriate and better than would be expected of a lot of shared hosting providers, provided they actually implement any of the things talked about in the response.
When you say we/us as a more expensive, but dedicated alternative, what is the cost difference as a percentage for say a small project?

Edit: found your site, looks like you’re cheaper than aws at a glance

If it was just the first point, the customer should be able to confirm that the activity was intended without even going through human review. It should be like when your bank texts you to confirm an odd transaction. They don't simply lock your account.

It sounds much more like it was the second point, which is unsettling. It's one thing to plan your pricing based on the assumption that most customers won't maximally-utilize. It's another thing to enforce a soft-limit that's vague and below what was advertised. I'd much rather have a lowered, known limit than whatever this is.

I totally agree, but unfortunately my bank (a major U.S. bank) does block a transaction and sometimes lock my credit card completely when they think the transaction is suspicious. There’s no confirmation mechanism, I have to call them to get the card unlocked. Of course, this usually gets resolved within five minutes (except that one time when I had to renew a .ng domain, and the Nigeria-originated transaction got auto-blocked three times in a row, and eventually the case had to be escalated to override their security mechanism entirely), not 29 hours.
> They don't simply lock your account.

Capital One did this to me once, and refused to restore the use of the blocked account even after I immediately called them and confirmed that the transaction that triggered the block was not fraudulent.

It was in combination with a lack of payment history. So if they had been paying it would not have triggered but they had been working off of credits instead. I think this point addresses your concern that paying customers should be allowed to mine.
I ran some really long compute jobs on GCP (100% CPU for weeks across many vCPUs) with credits without getting flagged. I was evaluating FFTW performance for a project. Perhaps GCP could tell I was calling into FFTW and not mining so they decided it wasn't fraud?
It makes sense for a company like DO to not allow crypto miners to use credits. Or else they would develop elaborate systems to create fake accounts and spin them up to mine.

Google can afford to eat the cost and perhaps has better heuristics to detect mining. And they definitely have better data to detect a single user signing up for multiple accounts.

Perhaps, or they viewed credits as payment history? I’m not defending the algorithm as even DO has said it was a false-positive. I just wanted to point out that this wasn’t an attack by DO on paying for crypto. That it specifically was trying to look at non-payers.
Having your instances run at 100% CPU pretty much raises a red flag at any cloud provider. Depending on your plan it either gets shut off (like in this case) or you get a notice about "suspicious" behavior and a bit of time to fix the "issue".
What's next? Having your disks use too much I/O causes the same response? Or actually using the RAM you pay for?

I run my own iron, with cloud only for elastic loads. Every time I launch a cloud instance, it will be using 100% CPU, otherwise I wouldn't launch it. It's unacceptable to label that profile as "suspicious". It never happened to me on AWS or Azure.

> ...you pay for

The major indicator here was the lack of payment history, so they hadn’t paid for it but were working off of credit. I think it’s a nuance that’s very important.

I'm sorry to dig heels, but that's no excuse. If the credit they were given allowed them to use the resources, it follows that using the resources is not a breach of contract.

From the description I imagine Digital Ocean offers a free period or tier, to reduce friction in customer acquisition. This is a marketing tool, and must not, in any way, cause situations like the one described.

If a marketing tool induces service failure, it has no place in a professional setting.

Credit and promo codes are also used extensively for fraud. If a business had been in operation for a while solely on credit, it may well generate a false positive in a fraud detection algorithm if it scaled dramatically.

But it is important to disconnect monetary spending from coupons or vouchers as they are not equivalent.

You mention free tier but that’s not what was at issue here. Also, 10 additional instances isn’t in the free tier of any cloud service I’ve used.

I’m not saying that DO is correct, but I believe the parent argument was a simplification if the events in question. Also, DOs handling of it via support was far worse than the initial algorithm, imo.

> But it is important to disconnect monetary spending from coupons or vouchers as they are not equivalent.

They must be. If they are not, then you've entered the territory I referred, where marketing actions are impacting service availability. This impact is not acceptable in professional services.

In this specific case, if voucher giveaways produce ingress of resource leeches (cryptominers that will never result in real customers), and if it is impossible to prevent this undesired ingress without impacting existing customers (which it is), then that marketing action must stop. This is the conclusion I expected from the post-mortem.

This is confusing though, since Digital Ocean credit can mean like a referral, or by prepaying your account - something I do to prevent billing overages.
Hardly the point.

Using what you've rightfully obtained shouldn't be regarded with suspicion.

That seems even more hyperbolic. Are you suggesting that no service should attempt to detect fraud?
Of course not.

Are you suggesting that 100% usage implies fraud?

There's a difference between suspecting fraud from high resource usage and equating high resource usage with fraud.

The latter is what is happening, here, and its outrageous.

I can assure you we run AWS instances at 100% with no problem at all. (Well, no problem from AWS; sometimes it's caused by a software bug.)
That's not true. A proper cloud provider (AWS or Azure) would not bat an eye because the CPU is frequently pegged at 100%.
That sounds odd to me. Especially given that digitalocean is the default dynamic provider for Gitlab CI builds which _will_ run droplets at 100% CPU.
From what I understand, they’re not saying you’re not allowed to use 100% for (what their user agreements define as) legitimate uses. They’re saying several droplets suddenly created and immediately going to 100% flags them as suspicious activity for human review. Looks like after such review, they would flag them as legitimate and all would be fine, 100% CPU or not.

They’ve botched that second step though.

That doesn't make sense to me. You pay for the time you have the droplets running, so it seems kind of silly to have them sit idle for a bit before you give them work to do.
I don't work at a cloud provider, but I think the reasoning is:

It's a common pattern in malicious actors to immediately spin up several droplets and immediately peg the CPU on each one.

There are, obviously, non-malicious actors who do the same, but it's a bit like wearing a balaclava in public: Likely to raise some suspicion just because it's associated with malicious actors.

Not sure what the materialization of that suspicion might look like -- competitors trying to crush DO's business? mass account creation or mass fraudulent logins? "mining crypto"? What I could come up felt quote-unquote legit grounds for a timed suspension but only instinctively so.
> I'd expect to be able to use the resources I bought without risk of being suspended

They weren't bought resources at the time, they were on credit. In this case a false positive for sure.

In the case of an actual cryptominer it's more likely they'll just ditch the account when it comes to billing time. Even more likely is that it's a compromised account that someone else has to pay for