| HN Mirror

Y	Hacker News new \| ask \| show \| jobs


	by Keyframe 1872 days ago
	Full scan. Still absurd to what we've come to to think this is tolerable.

3 comments

flakiness 1872 days ago

A full scan rarely happens on BQ because of the nature of the columnar store. Try some public dataset like HN archive, and see how a query actually costs. You'll need very advanced (or stupid) query to read 100GB at once on BQ.

link

phpnode 1872 days ago

Reading 100Gb from disk costs $0.50? this is absolutely incredible to me, how has this become acceptable to the industry?

link

asdf123wtf 1872 days ago

It's not for every use case, but Big Query is often a very stupidly cheap datastore. Query results get cached, and repeats don't incur a charge unless the data has changed.

It's not a datastore to power a crud app, or anything requiring frequent queries, but it's a great place to stash gobs of logs that you may need to query at some point. Or it's great for serverless batch workloads and is often cheaper in both time and money than firing up spark clusters or something similar to do the work.

Quite frankly, it's awesome. But sure, they do use it as a tool for lock-in, and for some cases it would be prohibitively expensive.

link

npsf3000 1872 days ago

Incredible as in 'that's a great deal' or as in 'that seems a ripoff'?

I find 50c to read 100GB from disk, do useful work on it (including running javascript code or ML models if you are so inclined) and returning a result in seconds... pretty damn incredible.

link

Justin_K 1872 days ago

We find value with this model because we don't pay for the instance when it's idle and queries come back extremely quickly.

link

manigandham 1872 days ago

A query reading about 100GB with one of the most advanced data warehouse systems with no operational overhead and integration into a major cloud environment costs $0.50.

There's a lot more to value than the price.

link

Keyframe 1872 days ago

Exactly.

link

gnfargbl 1872 days ago

The point of BQ is to allow you to perform queries which are ad-hoc and/or touch a significant fraction of the data. If you have a problem of that shape, then full column scans are not merely tolerable, they are optimal.

link