Hacker News new | ask | show | jobs
by ZeroCool2u 1800 days ago
I think GCP's official method for doing this is pretty similar to what you describe. You basically create a cloud function that disables billing if your bill goes over a configured limit. It's not perfect, because there's a tiny bit of lag between usage and billing calculation, but you'll only end up with a few dollars over the limit instead of thousands. Truly the nuclear option though.
4 comments

I did this last year for my project, except instead of disabling billing which would nuke everything, I wrote a service that runs every day, looks up my remaining monthly budget and sets the daily quotas on the APIs I use so they can't use more than my budget. (Which wouldn't be necessary if they offered monthly quotas to match the monthly billing period, but they don't.)

Then last month I got an email saying "Hey, those quotas you were setting using the API documented to set quotas, those were actually not being enforced the whole time because of undocumented issues with our systems." So basically you can't rely on the documented behavior of these systems, there's no good way to test whether your code is correct or whether your limits will work without actually exceeding your budget for real, and the whole thing is a clusterfuck. When you get a surprise bill you just have to throw yourself at the mercy of whichever first line billing support rep is randomly assigned to your case.

Limiting your bill to something less than "potentially infinite" is just a basic fundamental feature that shouldn't require rolling your own bill-monitoring service relying on poorly documented and malfunctioning APIs with no provision for testing. There's no excuse strong enough to explain why the cloud providers can't do something reasonable here.

And this is something that should've been added years ago. How many people have decided not to use these services because trying things out to learn seemed too risky? They're not going to gain these skills either, so they argue for alternatives when they actually need these capabilities.
This official method is so broken that it's embarrassing that they recommend it. It looks like a solution, but it doesn't work.

The "tiny bit of lag" between usage and billing calculation explodes when there's a lot of usage - in my case, a broken job tried resubmitting itself continuously, and the lag increased to 8 hours and $5000 just when I needed the alert the most. My team's response time was 5 minutes... After the 8 hour GCP lag.

Very similar to this guy's story: https://blog.tomilkieway.com/72k-1/

I had to go back and forth with them on email for weeks, and ultimately threaten them with a draft blog post with a lot of graphs and screenshots of their recommendations for them to cancel the bill.

Oh, on the GCP story I was always reminded of this:

https://blog.tomilkieway.com/72k-1/

Wow, well they had some pretty fundamental design problems that the author points out. Infinite recursion due to back linking is a pretty easy way to max out your bill. I'm glad that Google forgave the bill at least.
> GCP's official method for doing this is … a cloud function that disables billing if your bill goes over a configured limit

I’d love it if GCP’s official method were to disable billing if your bill went over a limit.

Sadly, I suspect it would just disable systems instead.

How does "disabling billing" but not "disabling systems" work?

Is this like asking the phone company "When I reach my plan limits, stop charging me money but let me keep making calls?"

That was my point.

I think GP was incorrect in the mechanism there.

I quoted GP verbatim, they said disable billing rather than disable service or delivery or system.