There are plenty of APIs on the internet that are free that query a database for information. If queries are too expensive it's not viable to run for free.
Or, GCP could implement cost/resource/use limits, which would allow them to give away whatever they wanted for free without any concern about people over using it, while also allowing people to avoid shooting their own feet off.
I don’t disagree but how does that work exactly? When you hit the quota the query gets cancelled? That’s definitely already a feature of Redshift Spectrum with WLM. Does BigQuery offer something similar?
My first choice would be something like "this query will cost $13953, which exceeds your default cap of $100; please click the confirm button if you really want to run it". (The dollars could be CPU-minutes or whatever if you want to use resource based limits, which might play nicer with a free tier)
Edit: rereading, I think this is actually for non-interactive scripts, in which case yes it should just cancel the query
You can set the size limit for individual queries. Plus the custom quotas and everything.
Part of the problem is that the OP wrote a script with a loop. So say you set the limit to 50 GiB per query, but then write a script that runs a 49 GiB query 1000 times...
That type of batch process should be designed much more carefully to consider costs.
The article doesn't say anything about a loop, and the estimated usage by the Google responder makes it seem like the cost is from a single "SELECT *".
> I was doing historical evaluation for a few sites, so I was running a query for each month going back to 2016 for each site. I've done this before with no real issues, and if I knew the charges were rapidly exploding I'd have halted the script immediately - but instead it ran for 2 hours and the first notice I got was the CC charge.
So looks like a loop of ((6 * 12) + 2) * #sites iterations with a full table scan every time.
I've forgotten more Sql than most people ever learn. Time is also valuable and I make trade-offs. Should I spend hours (eg. $$$) to optimize or run a non-optimized query in the background for a different cost? Well, I didn't think the time/benefit/cost equation favored tuning, if I had known that I'd have spent time on tuning. If you offer something for "free" and then change the cost, and don't have any alerting mechanisms to inefficient queries, it's impossible to evaluate trade offs.
It's rarely interesting logic that makes it expensive. Because the per-query charge is not based on compute cycles but the amount of data scanned. This is sufficient:
`SELECT * FROM super_wide_table_with_lots_of_text
WHERE NOT filter_on_partitions_or_clusters`
Select * is dangerous because it's a column store. You really need to look at the schema and select only the things you want. And when exploring the data it's important to use sane limits and pull from a single partition.
The idea that Google would give away BigQuery for free if only people would write better SQL queries.