Hacker News new | ask | show | jobs
by scotchio 2567 days ago
That’s a really fair and reasonable response. Not sure what else people really expect here.

> The template used for response in account denial will be removed entirely. If account access is denied during an appeal, which often is the case as most appeals are true bad actors, the agent must create a reasoned response.

Glad this is seen as an issue and corrected.

IMO, this probably would have made this whole thing never escalate if a better response was previously in place for everyone.

Accidents, shotty support, whatever — all expected these days unless you have big cash money agreements in place.

But to kill an account of a responsive person with a gigantic middle finger email without reasoning was a pretty dumb process in place. You can see the email on the Twitter thread somewhere.

Glad it’s fixed! Still a DO fan here

Edit: TALKING ABOUT THIS: https://pbs.twimg.com/media/D76ocofXoAY_xB5.png

7 comments

> That’s a really fair and reasonable response. Not sure what else people really expect here.

The root cause of suspension is incomprehensible to me. They were suspended because they launched a set of instances and these were using 100% CPU. How is that unreasonable and cause for suspension?

I'm not a Digital Ocean customer, but if I were, I'd expect to be able to use the resources I bought without risk of being suspended. This is the root cause. It was compounded by incompetent customer support, but I really do not understand the suspension cause.

The response tackles all secondary factors, but does not talk about the root cause. I'd expect it to.

Agreed. They say in the postmortem they it was protection against crypto mining, but what kind of weird reason is that?! If I want to pay for 10 instances mining crypto, why the hell wouldn't I be allowed to do that? I don't see why they should block any workload as long as the credit card details are valid.
It's a customer protection method. Most cryptominers are not using accounts they pay for. They compromise customer accounts and spin up resources. If you aren't proactive about communicating this to customers or blocking it, it can be quite some time before the customer notices and almost all customers will request a refund - even when the attack is a compromised password / successful phish on the customer's side.

Additionally, all cloud providers operate on various models of over-subscription. It is not in anyone's (customer / provider) interest to allow the full consumption of resources when the activity is fraudulent.

As you can see in the post-mortem, they are fine with the usage. They have a process and flag to allow legitimate customers to use their resources. However, based on previous experience at another cloud provider, I would bet that over 90% of those automated hits are correct.

This was bad support. They know that and they seem to be making the right moves to fix it. Fraud is bad for everyone and has to be combated. Not doing so can raise prices and kill a business like DO. I'm sure they feel awful that a customer was so poorly impacted, but the error wasn't in the first ban, it was everything after that.

Part of the whole issue here revolves around shared hosting in my opinion. Host hardware is so oversold that one customer utilizing 100% CPU is so impactful to a handful of other customers that it's not allowed at all. I have seen providers that has terminated services for less than 100% CPU usage, a constant 90% is enough on some of them. But due to the profit margins and shared hosting, providers are able to charge incredibly low prices per instance and be able to oversell their hardware sometimes as much as 10 to 20 times. That's as many as four hundred customers on a box that should maybe have 20 if it weren't oversold it all. In this case it really is an instance of you get what you pay for. The service we provide is no oversold hardware and all dedicated plans. Some people are initially very turned off by the pricing but the ability to allow customers to mine if they wanted to and not affect a single other customer on the platform giving each customer the same experience regardless of any other one images resource utilization, leaves too much happier customers even if smaller profit margins for us. At the end of the day customer experience and support provided are two of the most important factors in running a hosting provider. While I disagree with aspects of digital oceans business model as a shared hosting provider, I do think that the response to this was more than appropriate and better than would be expected of a lot of shared hosting providers, provided they actually implement any of the things talked about in the response.
When you say we/us as a more expensive, but dedicated alternative, what is the cost difference as a percentage for say a small project?

Edit: found your site, looks like you’re cheaper than aws at a glance

If it was just the first point, the customer should be able to confirm that the activity was intended without even going through human review. It should be like when your bank texts you to confirm an odd transaction. They don't simply lock your account.

It sounds much more like it was the second point, which is unsettling. It's one thing to plan your pricing based on the assumption that most customers won't maximally-utilize. It's another thing to enforce a soft-limit that's vague and below what was advertised. I'd much rather have a lowered, known limit than whatever this is.

I totally agree, but unfortunately my bank (a major U.S. bank) does block a transaction and sometimes lock my credit card completely when they think the transaction is suspicious. There’s no confirmation mechanism, I have to call them to get the card unlocked. Of course, this usually gets resolved within five minutes (except that one time when I had to renew a .ng domain, and the Nigeria-originated transaction got auto-blocked three times in a row, and eventually the case had to be escalated to override their security mechanism entirely), not 29 hours.
> They don't simply lock your account.

Capital One did this to me once, and refused to restore the use of the blocked account even after I immediately called them and confirmed that the transaction that triggered the block was not fraudulent.

It was in combination with a lack of payment history. So if they had been paying it would not have triggered but they had been working off of credits instead. I think this point addresses your concern that paying customers should be allowed to mine.
I ran some really long compute jobs on GCP (100% CPU for weeks across many vCPUs) with credits without getting flagged. I was evaluating FFTW performance for a project. Perhaps GCP could tell I was calling into FFTW and not mining so they decided it wasn't fraud?
It makes sense for a company like DO to not allow crypto miners to use credits. Or else they would develop elaborate systems to create fake accounts and spin them up to mine.

Google can afford to eat the cost and perhaps has better heuristics to detect mining. And they definitely have better data to detect a single user signing up for multiple accounts.

Perhaps, or they viewed credits as payment history? I’m not defending the algorithm as even DO has said it was a false-positive. I just wanted to point out that this wasn’t an attack by DO on paying for crypto. That it specifically was trying to look at non-payers.
Having your instances run at 100% CPU pretty much raises a red flag at any cloud provider. Depending on your plan it either gets shut off (like in this case) or you get a notice about "suspicious" behavior and a bit of time to fix the "issue".
What's next? Having your disks use too much I/O causes the same response? Or actually using the RAM you pay for?

I run my own iron, with cloud only for elastic loads. Every time I launch a cloud instance, it will be using 100% CPU, otherwise I wouldn't launch it. It's unacceptable to label that profile as "suspicious". It never happened to me on AWS or Azure.

> ...you pay for

The major indicator here was the lack of payment history, so they hadn’t paid for it but were working off of credit. I think it’s a nuance that’s very important.

I'm sorry to dig heels, but that's no excuse. If the credit they were given allowed them to use the resources, it follows that using the resources is not a breach of contract.

From the description I imagine Digital Ocean offers a free period or tier, to reduce friction in customer acquisition. This is a marketing tool, and must not, in any way, cause situations like the one described.

If a marketing tool induces service failure, it has no place in a professional setting.

Credit and promo codes are also used extensively for fraud. If a business had been in operation for a while solely on credit, it may well generate a false positive in a fraud detection algorithm if it scaled dramatically.

But it is important to disconnect monetary spending from coupons or vouchers as they are not equivalent.

You mention free tier but that’s not what was at issue here. Also, 10 additional instances isn’t in the free tier of any cloud service I’ve used.

I’m not saying that DO is correct, but I believe the parent argument was a simplification if the events in question. Also, DOs handling of it via support was far worse than the initial algorithm, imo.

This is confusing though, since Digital Ocean credit can mean like a referral, or by prepaying your account - something I do to prevent billing overages.
Hardly the point.

Using what you've rightfully obtained shouldn't be regarded with suspicion.

That seems even more hyperbolic. Are you suggesting that no service should attempt to detect fraud?
I can assure you we run AWS instances at 100% with no problem at all. (Well, no problem from AWS; sometimes it's caused by a software bug.)
That's not true. A proper cloud provider (AWS or Azure) would not bat an eye because the CPU is frequently pegged at 100%.
That sounds odd to me. Especially given that digitalocean is the default dynamic provider for Gitlab CI builds which _will_ run droplets at 100% CPU.
From what I understand, they’re not saying you’re not allowed to use 100% for (what their user agreements define as) legitimate uses. They’re saying several droplets suddenly created and immediately going to 100% flags them as suspicious activity for human review. Looks like after such review, they would flag them as legitimate and all would be fine, 100% CPU or not.

They’ve botched that second step though.

That doesn't make sense to me. You pay for the time you have the droplets running, so it seems kind of silly to have them sit idle for a bit before you give them work to do.
I don't work at a cloud provider, but I think the reasoning is:

It's a common pattern in malicious actors to immediately spin up several droplets and immediately peg the CPU on each one.

There are, obviously, non-malicious actors who do the same, but it's a bit like wearing a balaclava in public: Likely to raise some suspicion just because it's associated with malicious actors.

Not sure what the materialization of that suspicion might look like -- competitors trying to crush DO's business? mass account creation or mass fraudulent logins? "mining crypto"? What I could come up felt quote-unquote legit grounds for a timed suspension but only instinctively so.
> I'd expect to be able to use the resources I bought without risk of being suspended

They weren't bought resources at the time, they were on credit. In this case a false positive for sure.

In the case of an actual cryptominer it's more likely they'll just ditch the account when it comes to billing time. Even more likely is that it's a compromised account that someone else has to pay for

I can't really fault their postmortem or their response on HN. The corrections are all good, but the very fact that these things need to be corrected (automatically locking the entire account when there is a compute spike, having such a casual review process before permanently denying access to an account, not having 24/7 support after locking an account, etc.) makes you question their overall maturity as a B2B infrastructure provider.
Sure it's better to never make a mistake but so long as they don't make a habit of things like this I'm not going to think anything of it until I see more cracks in the wall.

A screw up is inevitable. A mature response is not. So the fact they gave mature response goes a long way. Although it's unfortunate that social media seems to be their emergency support channel...

> Sure it's better to never make a mistake but so long as they don't make a habit of things like this

This is the thing - the customer that got locked out managed to get attention on HN, Reddit and other media - this seems to have prompted action from Digital Ocean.

How many have silently fallen victim before this ? We don't really know if this is a habit or not - we only know this one customer was corrected.

Based on this post, Digital Ocean is taking specific measures company wide to prevent similiar issues from affecting any customer in the future. So they did no just correct the situation for this one customer.
> declined to activate it

Except they were declining to unlock it, right? I’m always shocked to see support that’s so pitiful they don’t even bother to have a correctly worded template for a common event.

The real problem is support reps that aren’t trained properly and don’t even care enough to apply a bit of common sense. Getting rid of a response template doesn’t automatically make the support reps care enough to apply common sense.

How about a “don’t fuck me” support tier where I can pay a one time $100-$250 fee for the sole purpose of getting a phone call before my account gets banned?

The real problem is most definitely not the support rep. They don’t really go off book. This is the process as designed and approved by higher management, not by a low pay first level support (unless you assume they have some top level engineer doing this stuff).

And going off process could make it better... yay, self pat on the back. But it could make it worse in which case I see unemployment in the support rep’s future. So they won’t go that way very often.

Anyone who ever had such a low positioned job knows how it works. At that level your only freedom is to do what you’re told and follow company process.

No, this is the fault of the manager who asked for this process and their manager who approved it. Management isn’t just about picking up a higher paycheck, it’s also to take the accountability for the decisions made under your watch.

> That’s a really fair and reasonable response. Not sure what else people really expect here.

If you nuke VMs, under no circumstances do you also nuke access to data, backups, etc.

Because if it wasn't for "social escalation" (aka: mob justice via HN and Twitter), this 2 person company would have lost everything.

If you terminate a customer for $reasons, the data still belongs to the user, and not the company. And the company should still be legally required to provide the data on a reasonable timescale, like FTP access for 7 days.

While you're swinging that legal word around, have an armchair lawyer skim DO's Terms of Service.

> 9.1 Subscriber is solely responsible for the preservation of Subscriber's data which Subscriber saves onto its virtual server (the "Data"). EVEN WITH RESPECT TO DATA AS TO WHICH SUBSCRIBER CONTRACTS FOR BACKUP SERVICES PROVIDED BY DIGITALOCEAN, TO THE EXTENT PERMITTED BY APPLICABLE LAW, DIGITALOCEAN SHALL HAVE NO RESPONSIBILITY TO PRESERVE DATA. DIGITALOCEAN SHALL HAVE NO LIABILITY FOR ANY DATA THAT MAY BE LOST, OR UNRECOVERABLE, BY REASON OF SUBSCRIBER'S FAILURE TO BACKUP ITS DATA OR FOR ANY OTHER REASON.

Summary: Do offsite backups n'all you dinguses

You're right. It's good to have some expectations of the company, but customers really need to take the TOS seriously.
It's yet again that the ToS is hidden crap that goes against the direct things, like "Backups".

Sure the ToS needs legalese crap for the lawyers, but a plain version also needs to be made. I'm certainly no lawyer, and nor are most people.

It’s not really hidden and they do have an easy to read non-lawyer summary underneath the part I quoted

> In other words, we trust that you’ll be responsible and back up your own data. Things happen!

It doesn't take a college degree, in law or otherwise, to understand that data in one place - whether that's physically or under the umbrella of a single service provider - is subject to unpredictable, unexpected, total loss.
I agree it's a generally good response. There are a few more things I'd like to see more clearly addressed:

* While the removal of the account termination template is good, in conjunction with additional hiring to support more attention to any individual ticket, I can't tell by whose standards the "reasoned response" is gauged, or if the response is reviewable at all. I did note that they now want two human reviewers, but that's distinct from specifying a process in which a reasoned response is articulated and reviewed.

* More importantly, if the reasoned response doesn't pass muster with the customer, what's their resort? Still Twitter-shaming? I suppose that's legit if they'd rather their mistakes were public like this.

* The question of whether an account-wide lockout w/ no data retrieval is a necessary/proper consequence for those flagged for CPU abuse needs addressing -- ideally they should have a different policy that allows for data egress (with bandwidth fees, if necessary), but if not, a rationale and clear policy might be acceptable.

Back in the days before Twitter, folks wrote to the CEO or other senior executives as a last resort. Might still be effective in some cases.
> shotty support

"shoddy", for what it's worth.

Woop. TIL - Thanks!
How tremendously forgiving.