Hacker News new | ask | show | jobs
by buremba 846 days ago
This happens because Google hides the query cost behind its abstracted "TBs scanned" (for their data format, not even open-source so it's hard to estimate in advance) or even worse "slots" mechanism. Only a fraction of people try to understand how much these slots cost and most of them are the people who got an unexpected bill after using BigQuery and became more aware of how the product works.

If GCP would return the query cost in the API and show it directly in the console when you run a query, it would be much easier for their users but unfortunately, it's not Google's interest for obvious reasons.

1 comments

Exactly, even after seeing the issue I can't make heads or tails of what the hell a "TBs scanned" is relative to row counts, etc. Likewise, it seems to place a lot of assumptions on knowing what tables include - and on a dataset you didn't build yourself how can you know the tables are optimized to lower your costs? Hell, how can you even know what the costs are?
"TBs scanned" is the number of tebibytes of stored data that the system had to scan to serve your query. This is how BQ is billed, in the on-demand model.

The console shows you this number (in very small letters) after you have entered the query but before you press go. In the on-demand billing model, which is what you were using, you can multiply this number by $6.25 to understand your query cost, exactly.

It's a design that's hostile to new customers, I agree. But it is comprehensible.

There should be a cost estimate displayed prominently by default, and an option to turn it off for power users who know what they're doing (but keep the current less-prominently displayed amount of data estimate).