Hacker News new | ask | show | jobs
by Keyframe 1872 days ago
Full scan. Still absurd to what we've come to to think this is tolerable.
3 comments

A full scan rarely happens on BQ because of the nature of the columnar store. Try some public dataset like HN archive, and see how a query actually costs. You'll need very advanced (or stupid) query to read 100GB at once on BQ.
Reading 100Gb from disk costs $0.50? this is absolutely incredible to me, how has this become acceptable to the industry?
It's not for every use case, but Big Query is often a very stupidly cheap datastore. Query results get cached, and repeats don't incur a charge unless the data has changed.

It's not a datastore to power a crud app, or anything requiring frequent queries, but it's a great place to stash gobs of logs that you may need to query at some point. Or it's great for serverless batch workloads and is often cheaper in both time and money than firing up spark clusters or something similar to do the work.

Quite frankly, it's awesome. But sure, they do use it as a tool for lock-in, and for some cases it would be prohibitively expensive.

Incredible as in 'that's a great deal' or as in 'that seems a ripoff'?

I find 50c to read 100GB from disk, do useful work on it (including running javascript code or ML models if you are so inclined) and returning a result in seconds... pretty damn incredible.

We find value with this model because we don't pay for the instance when it's idle and queries come back extremely quickly.
A query reading about 100GB with one of the most advanced data warehouse systems with no operational overhead and integration into a major cloud environment costs $0.50.

There's a lot more to value than the price.

Exactly.
The point of BQ is to allow you to perform queries which are ad-hoc and/or touch a significant fraction of the data. If you have a problem of that shape, then full column scans are not merely tolerable, they are optimal.