Hacker News new | ask | show | jobs
by JohnScolaro 63 days ago
> We had a budget alert (€80) and a cost anomaly alert, both of which triggered with a delay of a few hours. By the time we reacted, costs were already around €28,000.

I had a similar experience with GCP where I set a budget of $100 and was only emailed 5 hours after exceeding the budget by which time I was well over it.

It's mind boggling that features like this aren't prioritized. Sure it would probably make Google less money short term, but surely that's more preferable to providing devs with such a poor experience that they'd never recommend your platform to anyone else again.

4 comments

I get furious every time this comes up and somehow there are bootlickers ready to defend big tech on it.

My ~2 person small business was almost put out of business due to a runaway job. I had instrumented everything perfectly according to the GCP instructions - as soon as billing went over the cap the notification was hooked up to a kill switch, which it did instantly.

GCP sent the notification they offered as best practice 6 HOURS late. They did everything they could to not credit my account until they realized I had the receipts. They said an investigation revealed their pipeline was overwhelmed by the number of line items and that was the reason for the lag. ... The exact scenario it is supposed to function in. JFC.

Almost wish the people defending it were paid. Almost more intelligent to rush to the defense if there were a direct financial benefit.

Part of it is possibly the curse of knowledge. Someone in the 99th percentile of cloud configuration experts simply can't recall their junior dev days.

In my junior dev days I always paid for the resources I used. Just because you consume a lot of resources by accident that doesn't mean you shouldn't have to pay for it. Accidents do not absolve you from liability.
It's not about not paying for the resources you use. It's about not having any mechanism to limit those resources, despite that being an entirely reasonable thing for the cloud providers to provide.

Using these platforms is like giving everyone in your business a credit card with an infinite limit. If someone steals it, or anyone makes a mistake, your liability is literally unlimited for no reason at all other than complete laziness by the counterparty.

These are completely normal and expected concepts in commercial contracts that the cloud providers just have no respect to provide. I would even wager that their bigger customers have this in their contracts and only SMBs get screwed like this.

This is not about paying or not paying. It's about cloud providers not having working tools that let you limit your spending.

If I don't set up a budget and run up a huge bill, fine, sure, I should probably pay for it. But if I follow best practices and set up a rule like: "if usage > X €, then stop accepting jobs", and I do it correctly according to the vendor's instructions, yet it still lets me blow past the budget, that's entirely on the vendor.

Interesting!

I know software is special. That's why software defects are acceptable while a crumbling bridge is not.

With that said, should this apply to other industries? If I clip a warehouse shelf on my first day driving a forklift, should my wages be garnished for life to cover the inventory? Or is the inherent nature of the logistics industry such that an accident does not always imply liability? (Or other)

The employer is held liable in such a scenario.
Sounds right. Not sure if this is the position:

If you’re coding, you should pay for your mistakes, if you’re driving a forklift (sober/responsibly), your employer should pay?

Exactly my thoughts, can not really understand how delayed alerts are acceptable... Have you managed to settle the cost with Google, what was the outcome?
Back in 2020 I had a similar situation. Ended up charging $500 due to an overnight TPU training run using egress bandwidth across zones.

Google support was surprisingly understanding, after I explained the issue. They asked some clarifying questions. Then they said that they can offer a one time refund for this case.

Since then I was paranoid not to accidentally do it again. I don't know whether GCP would refund a second time.

GCP charging for interzone traffic is an interesting financial choice. They own all the infra and in many cases this is literally moving from building to building.
There's cross-region, and cross-zone. If both boxes are located within the same zone (e.g. us-east1) then the bandwidth is free, since it's intrazone traffic. Cross-zone egress traffic (e.g. us-east1 to us-central1) is billed at a certain rate, and cross-region egress traffic (e.g. us-east1 to europe-west8) is billed at a significantly higher rate.

Amusingly enough, ingress traffic seems to always be free. So you can upload as much data as you want into their cloud, but good luck if you need to get it out.

I am referring to cross-zone within in the same region, so like us-central1-a to us-central1-b. These are building to building and often never cross public land.
Oh, yes! I forgot entirely about that case. You're right, egress traffic is charged there too.

Are the datacenters really located so close together? I assumed they weren't within walking distance of each other.

Which cloud provider actually prioritises features that cut off your money supply? Because AWS sure as shit doesn't either.
Amazon, Microsoft and Google don't offer hard cap. Most other/smaller public cloud providers do. The reasons are quite obvious.
Spend alerts are a post-mortem tool dressed up as prevention. By design they fire after the billing cycle aggregates. The real fix is catching runaway patterns at the config level before they run. That is what we are building at Traeco. The difference between an alert after 28k euros and a flag before the first run. traeco.dev
From the linked thread: https://ai.google.dev/gemini-api/docs/billing#tier-spend-cap...

The warnings firing off hours later is obviously awful design, but the warnings are just warnings. The spend caps are something different and Gemini has them at the very least.

For most use cases where businesses use the cloud hard spending caps are an awful idea anyway. Killing your servers the moment you start picking up loads of new customers is a surefire way to kill your big growth opportunity at exactly the wrong time.

Of course, if you're not planning for sudden massive growth, you'd be crazy to host your stuff with the big three cloud providers.

we love Amazon, Microsoft and Google being altruistic and making sure your not burdened with too much money
> Sure it would probably make Google less money short term, but surely that's more preferable to providing devs with such a poor experience that they'd never recommend your platform to anyone else again.

Welcome to late-stage capitalism, where there is no long-term thinking, only short-term profit stealing, and Fuck You I Got Mine.