Hacker News new | ask | show | jobs
by dap 2014 days ago
I have seen similar stories with AWS. It’s somewhat shocking to me that there’s no way to ask to get cut off above some dollar limit. Is every customer risking unbounded liability?
7 comments

Billing is traditionally not built into critical paths, but on async or even batch processing of logs from those systems. I doubt any system at Amazon really knows what you’ve spent until after the fact.
I think that’s true, but that’s a reflection of what people considered important when they built it. It’s not the only way to build things.
The liability is on Google's side, mostly. There are hard limits in terms of the number of instances you can create without deliberately asking to spend more money, and these hard limits are set based on what Google is willing to write off for an overnight mistake.
Dollar limits don't make sense for companies because you can't predict which parts of your infrastructure would get shutdown first. Hobbyists don't mind if everything is turned off but they would still want to keep the stored data. There would still be an opportunity to be overcharged on storage even with a dollar limit.

However, setting usage limits would be a solution for both companies and hobbyists. AWS could then calculate the maximum spending per month that is possible with the current settings. I bet they would never build such a calculator and the necessary usage limits because it makes it easier for customers to optimize costs.

Well, I suggested an option. That doesn’t mean companies have to choose it.

But to be honest, it’s somewhat surprising that companies are willing to take on the risk of unbounded financial liability if someone makes a mistake.

There are too many services that have continuous billing. Only way to hard stop is to start deleting compute and data.

Most AWS users would rather lose money than data and service for their customers, and bills are easier to negotiate than trying to recover your infrastructure.

The in-between approach is to create rate limits (either per sliding scale or total), which exists for some products but is probably too complicated to implement for everything.

Taking aws for example, the interface for the site is quite good overall with the suspicious exception of the billing pages, which are completely mysterious and unusable. What a weird coincidence, not a dark pattern at all
If I were in AWS/GCP's position, I would prefer to send alerts rather than turn off services.

Shutting off services can mean destroying the customer's data with no way for them to recover it. That could be from terminated ephemeral disks, or a terminated database, or cutting off a critical upload stream into their instances.

Its a lot easier to reduce/forgive a bill when a customer makes a mistake than to recover their lost data (or loss to their business).

Can’t you just stop spinning up new services and suspend running ones including connections and db accesses? Start with bandwidth?
Stopping DB accesses is a great way to mess up a lot of stuff. Even just stopping connections to the live website would be full of potential issues. Should it stop all access or still allow access to the administrative interface? What is the root cause of the billing overage is a report by finance or someone running a large job on EMR, should that pull the plug on the website?

What if the new services that are being suspended are writes to queuing systems that are used for order fulfillment or other business processes, should we drop these orders on the floor?

It's much easier to handle it post facto, and write off the expense on the cloud provider side, which doesn't cost them that much anyway. There are some guard rails that prevent people from doing catastrophic things that they can't write off (eg. taking all of the compute capacity of a region for hours on end, preventing other customers from actually using it) using limits that require manual intervention to be raised.

you can get billing alerts today. it’s not the lack of control mechanisms that prevented this failure. it was the noobism of the user
The over spend was caused by an application bug, not because the user was a "noob".
What about prepaid credit cards with payment limit? If the payment failed the service will be terminated? Or does aws continue and send an invoice anyway?
They still send the invoice. I accidentally left a couple of small files in an S3 bucket for years. Eventually my card expired on that account and they continued bothering me until I contacted support and we agreed to waive the bill.

I imagine they would have been more forceful if it was a larger bill.

Google once (falsely?) considered my Revolut debit card a pre-paid card and refused to accept it for GCS billing. The error message wasn't anything generic either - it stated specifically that pre-paid cards are not accepted.
Aren't Revolut's cards prepaid? I was under the impression it's not full, real bank account since they make it easy to transfer money in out. I imagine they're making money off the issuing bank part of the interchange fees
> What about prepaid credit cards with payment limit? If the payment failed the service will be terminated?

Never used one for this purpose but since billing happens after the fact (and monthly), AWS won't be aware of the limit until after the monthly billing occurs. They’ll just tell you the card failed and you have a billing liability to take care of (and give you some time to fix it) while still letting you rack up additional debt with services running after the billing fails; they definitely won't cut you off when you've reached the level that would meet the limit on your card (and couldn't even in theory without realtime notification of other charges against the card, even if they were inclined to.)