| HN Mirror

Y	Hacker News new \| ask \| show \| jobs


	by epanchin 892 days ago
	To be brutally honest, it’s badly considered queries like yours that mean these services cannot be free.

3 comments

threeseed 892 days ago

This comment is hilariously insane.

The idea that Google would give away BigQuery for free if only people would write better SQL queries.

link

charcircuit 892 days ago

There are plenty of APIs on the internet that are free that query a database for information. If queries are too expensive it's not viable to run for free.

link

yjftsjthsd-h 892 days ago

Or, GCP could implement cost/resource/use limits, which would allow them to give away whatever they wanted for free without any concern about people over using it, while also allowing people to avoid shooting their own feet off.

link

mulmen 892 days ago

I don’t disagree but how does that work exactly? When you hit the quota the query gets cancelled? That’s definitely already a feature of Redshift Spectrum with WLM. Does BigQuery offer something similar?

link

yjftsjthsd-h 892 days ago

My first choice would be something like "this query will cost $13953, which exceeds your default cap of $100; please click the confirm button if you really want to run it". (The dollars could be CPU-minutes or whatever if you want to use resource based limits, which might play nicer with a free tier)

Edit: rereading, I think this is actually for non-interactive scripts, in which case yes it should just cancel the query

Edit 2: https://news.ycombinator.com/item?id=39447499 was kind enough to point out that the resource-based version of this might actually exist, which is nice

link

anon84873628 892 days ago

https://cloud.google.com/bigquery/docs/best-practices-costs#...

You can set the size limit for individual queries. Plus the custom quotas and everything.

Part of the problem is that the OP wrote a script with a loop. So say you set the limit to 50 GiB per query, but then write a script that runs a 49 GiB query 1000 times...

That type of batch process should be designed much more carefully to consider costs.

link

justinclift 892 days ago

> ... the OP wrote a script with a loop.

Are you sure?

The article doesn't say anything about a loop, and the estimated usage by the Google responder makes it seem like the cost is from a single "SELECT *".

link

mulmen 892 days ago

According to https://news.ycombinator.com/item?id=39447465:

> I was doing historical evaluation for a few sites, so I was running a query for each month going back to 2016 for each site. I've done this before with no real issues, and if I knew the charges were rapidly exploding I'd have halted the script immediately - but instead it ran for 2 hours and the first notice I got was the CC charge.

So looks like a loop of ((6 * 12) + 2) * #sites iterations with a full table scan every time.

link

justinclift 892 days ago

Thanks, that does add further detail after all. :)

link

httparchive 892 days ago

I've forgotten more Sql than most people ever learn. Time is also valuable and I make trade-offs. Should I spend hours (eg. $$$) to optimize or run a non-optimized query in the background for a different cost? Well, I didn't think the time/benefit/cost equation favored tuning, if I had known that I'd have spent time on tuning. If you offer something for "free" and then change the cost, and don't have any alerting mechanisms to inefficient queries, it's impossible to evaluate trade offs.

link

johnnyo 892 days ago

Can you post what a $14,000 SQL query looks like?

If nothing else, it can be an example in my SQL 101 course.

link

anon84873628 892 days ago

It's rarely interesting logic that makes it expensive. Because the per-query charge is not based on compute cycles but the amount of data scanned. This is sufficient:

`SELECT * FROM super_wide_table_with_lots_of_text WHERE NOT filter_on_partitions_or_clusters`

Select * is dangerous because it's a column store. You really need to look at the schema and select only the things you want. And when exploring the data it's important to use sane limits and pull from a single partition.

link

httparchive 892 days ago

Here you go!

SELECT page, url, payload FROM `{table}` WHERE page like '%{site_domain}/%' AND url like '%[EXAMPLE.COM]%'

---

There's no LIMIT on it b/c I actually need all the results.

link

dabernathy89 892 days ago

This would make a great educational blog post

link

kstrauser 892 days ago

> Time is also valuable and I make trade-offs.

I'd say!

link