Hacker News new | ask | show | jobs
by appstorelottery 2278 days ago
Every morning I would check AWS billing just out of habit. I'm just thankful I did - otherwise everything would have kept running...

The lesson for me was don't trust your internally-hacked-together instance management system. The AWS interface to storage and instances is the base truth. And perhaps more importantly - I'm never getting into another startup which has financial risk like that without being a core expert in that risk/tech. I was focused on the business + client code - and had very little clue about the nitty-gritty of AWS. I should have been more involved with the code on that side, or at least the data-flow architecture.

8 comments

SRE here. I feel for your situation. Here's some advice. One simple thing you could do is set up AWS billing alarms and have them delivered to a notification app like PagerDuty.

https://docs.aws.amazon.com/AmazonCloudWatch/latest/monitori....

If you don't want to pay for PD, you can patch together any number of ways to get your phone to scream and holler when it gets an email from ohshit@amazonasws.com. It's also good to have clear expectations as to whose responsibility it is to deal with problem x between the hours of y and z and exactly what they are supposed to do.

Keep the alerts restricted to the really important stuff, because if your team becomes overloaded with useless alerts they will 1) dislike you and 2) be more prone to accidentally mistaking a five alarm fire for a burnt casserole.

There are more complex systems you could build, but that's a start.

Thank you for this. How can anyone run ANY service with ANY company and not add a clause in the contract (and then have the alerts up an running) in controlling costs?

I remember PagerDuty was advertising (a lot) on Leo Laporte's podcasts a few years back.

A clause in the contract: if monthly bill reaches $Xk amount then:

(a) seek written approval by client, and

(b) continue until $Yk or approval is given with a new ceiling price.

I was just playing around with AWS a while ago and was surprised that I could not find any option to put a cap on the amount I'd spend in a month. Only thing I could do was set up alerts.

I imagine AWS would have 0 problems suspending all my services if I can't pay, so why can't it do the same thing when it reaches my arbitrary cap?

> I'm never getting into another startup which has financial risk like that without being a core expert in that risk/tech

This may be something that is 'unstated', but unless you actually had access to fix something that was wrong, as well, being an expert in that wouldn't really help all that much. I've been in situations where I have explicit/expert knowledge of XYZ, but when the people responsible for XYZ do not take your input, and/or don't provide you the ability to fix a problem, expert knowledge is useless (or worse, it's like having to watch a train wreck happen when you know you could have stopped it).

This. But on the other hand, you can be ready with the popcorn when shit eventually does hit the fan.
And then have to live with asking yourself "could I have done more?"
As in beer and crisps? /s
"...could I have saved the day if I were willing to loudly complain until someone listened?"
On the other hand, it sounds like you hired someone who wasn't really up for the level of responsibility given. :(

In theory ;), you shouldn't have to be a core expert in everything. But yeah... in the real world, things aren't so cut and dry. :/

TBH, the real problem is AWS bills cannot be capped in any way (you can setup an alarm, though). It's unreasonable to expect a programmer won't make mistakes.
Of course they can be capped, you just turn off the services. If you're asking them to automate that for you, then the counterpoint would be people accidentally setting a budget that wipes out their resources and complaining about that.

Easier for both sides to just ask AWS for a refund if there's a reasonable case.

> the counterpoint would be people accidentally setting a budget that wipes out their resources and complaining about that.

This wouldn't be an issue if it was configurable.

Mistakes will always be an issue. How you recover is more important.

Would you rather make a mistake leading to a big bill with the possibility of a refund or set your max budget and have your resources permanently deleted?

There would be no need to delete existing resources. Just prevent me from creating new ones until action is taken. For small projects in particular, I'd much rather have service taken offline and an email notification than even a $1000 bill. And $1000 is small in the scale of what you could end up with on AWS.
> Of course they can be capped, you just turn off the services.

That's not a he's cap, since turning off services isn't instant and costs continue to accrue. But, yes, there are ways to mitigate the risk of uncapped costs and they are subject to automation.

See the sibling comment thread. It's just not that simple. It creates a lot of liability, could lead to permanent data loss, and doesn't really prevent any mistakes either (just swaps them for mistakes in budget caps).

AWS would rather lose some billings than deal with the fallout of losing data or critical service for customers (and in turn their customers).

it depends on the use case. For example, I would like to have developer accounts with a fixed budget that developers can use to experiment with AWS services, but there isn't a great way to enforce that budget in AWS. In this case I don't really care about data loss, since it's all ephemeral testing infrastructure.

In theory I could build something using budget alarms, apis, and iam permissions to make sure everything gets shut down if a developer exceeds their budget, but if I made a mistake it could end up being very expensive. Not that I don't trust developers at my company to use such an account responsibly, but it is very easy to accidentally spend a lot of many on AWS, especially if you aren't an expert in it.

Should be cap so you have a check. If your system does not allow threshold or assertion, please do not use it. If your cloud system do not have capped budget so you play in and alert you when you soon run out, do not use it.
>In theory ;), you shouldn't have to be a core expert in everything. But yeah... in the real world, things aren't so cut and dry. :/

Right. In my experience, if you don't understand what's going on beneath your abstractions, you're always in for a world of hurt as soon as something goes sideways.

Did you reach out to AWS support or your account manager? They’d definitely have worked something out.
Did you contact AWS and let them know it was a mistake?

They have a good track record of cancelling huge bills the first time they happen

Assuming you were incorporated and had a business account - declare bankruptcy and the bill goes away. I don’t understand why you would still pay the bill if you were going out of business anyway.
Why didn't I file bankruptcy? This happened in Australia and declaring bankruptcy was not the right thing to do - for many reasons, not the least of which it makes it much harder to operate as a director of a previously bankrupt company, but in the worst case my bank would have just gone after me as I'd given a personal guarantee.
There is no concept of limited liability in Australia?
Even in the United States, most small business loans require personal guarantees which narrowly override the corporate limited liability to make that guarantor liable for that debt if the company doesn't pay. There are some rare exceptions, and possibly more for startups funded by big-name VCs, but I don't know.
But this isn't a small business loan: it's a debt to Amazon.
I read that as the business owner had a preexisting business loan with a personal guarantee.
Except the loan money will go straight to Amazon, and you are now unable to repay the loan to the bank
I’ve worked in many early startups and I’ve never seen anyone use such a loan.
Were they in the US and funded by VCs? That kind of startup probably doesn't need to do this. Unsure about VC-funded businesses elsewhere. Many or even most small businesses without VC funding do take that kind of loan.
You work at the 1%

The real world is filled with barbershops, daycares, bars, clinics, PVC manufacturers etc

None of them get VC money.

When they need money, they go to a bank and usually have to place a PG in order to get funds.

Tech startups have it easy. Its all equity. You are not pledging your lifetime earnings on a business idea.

Once tech startups lose their upside potential (prob not anytime soon if ever), you will be sitting with the regular folk, those that pledge their skin and life to their business.

If a director becomes personally bankrupt (such as trying to be the good guy and using personal guarantees to take on company debts in an effort to scrape through) then they're banned from running a company until it clears. If they're the director of a company that goes bankrupt, I believe they get 2 chances (companies) before there's a chance of being banned from running more for a time.

Either way it might be nice to keep your options open, depending on your plans.

Or you could just send an email to support and ask them to waive the charges.
If that got to the right person on the right day and they knew it was going to kill the company, it seems likely to help. And combined with the fact that it would probably guarantee future revenue way off into the future...
I have never heard of a case where they wouldn’t give refunds. AWS is competing with the 95% of compute that is not running in the cloud (their own statistics). The last thing they want is a reputation that one mistake will bankrupt a business.
We had spot instances with a mistakenly high bid that incurred thousands overnight when the prices spiked. No refund offered.

I know several other companies that had expensive mistakes without refunds. There's probably a complex decision tree for these issues and I doubt anyone really knows outside of AWS.

> I have never heard of a case where they wouldn’t give refunds.

Really? Working in Southern California a few years ago, refund requests were refused ALL THE TIME. This is why there's a common belief that what you are charged you simply owe them, period.

It may be more progressive now, but let's not be revisionist.

Once I got something like a year of EC2 charges retroactively reimbursed for a few instances I hadn't used.
I've repeatedly seen requests of this nature handled by AWS - 75% cuts to billing, 90% cuts even.
This. I work at Amazon and this is more common than you'd expect. "Customer obsession" and all that.
I'm not the type to 'want to speak to the manager' for my self-imposed problems but the more I hear about people coming out ahead the more I think I need to change my ways.
I think you have to think of it a bit more from Amazon's perspective. If you accidentally burn through your entire startup capital and shut down, they lose. If the risk of this sort of thing becomes well-known, then startups will start using other services rather than AWS, and the small fraction that grow big will be less likely to use AWS.

Being an entitled jerk who blames other people for your own negligence is bad, and you shouldn't change that. But openly giving companies the opportunity to be kind (while admitting that it was entirely your fault) potentially helps both them and you.

Yep, and an opportunity to educate on things like budgets and billing alarms to try to prevent this in the future.
Yeah, every time I’ve heard this story support have always fixed it, at least the first time per account
AWS should have a cost cap. Set a max spend value and shut down all servers if you spent it.
> AWS should have a cost cap. Set a max spend value and shut down all servers if you spent it.

That might make sense for some particular services (e.g., capping the cost on active EC2 instances) but lots of AWS costs of data storage costs, and you probably don't want all your data deleted because you ran too many EC2 instances and hit your budget cap.

Where exactly you are willing to shut off to avoid excess spend and what you don't want to sacrifice automatically varies from customer to customer, so there's no good one-size-fits-all automated solution.

I think if resources had an option of "At cap: Do nothing, Shut down, shutdown and erase data" that would cover most of the use cases.
Keeping the data for a week but completely inaccessible would not be a huge cost for AWS yet a big relief for startups.
We used to have a bunch of billing graphs in stack driver with alerting thresholds to pagerduty to capture exactly situations like this.