| HN Mirror

Y	Hacker News new \| ask \| show \| jobs


	by m0zg 2092 days ago
	That's also a problem that it'd be fairly straightforward for Google to solve by automatically spinning up smaller, entirely separate serving clusters for customers who are worried about such a blowout (for a fee, obvs). It's just the serving tree (+ whatever in-memory storage service they use to do distributed joins nowadays), no need to duplicate the rest of the service. The caveat is, a smaller cluster will favor query optimizations specific to that smaller cluster. Some of those "small cluster" optimizations could hurt query performance when deployed against BQ proper with its tens of thousands of workers. Also, BQ does explain the query plan to some extent: https://cloud.google.com/bigquery/query-plan-explanation. Not quite at the level of a "regular" SQL DB, but it does give you some info to work with when optimizing queries. If you haven't used it in a while I'd give it another try.

2 comments

quirmian 2092 days ago

I believe this is exactly what slot reservations in BigQuery achieve. Instead of paying on-demand pricing that is determined by data read, you purchase a fixed number of “slots” that are shared by queries running within that particular project.

link

m0zg 2092 days ago

Ah OK, after reading their docs I see they've changed what "slots" used to mean in Dremel (internal version of BQ). It used to be that slots _guaranteed_ capacity, but did not limit it. Meaning that you could rely on having a certain number of workers in the cluster when you issue a query, but if Dremel had more it'd give you all it's got. Obviously this is not viable when people have to pay per terabyte read, because a ton can be read.

What they have now strikes me as an even better solution to the problem of bankrupting someone with a query IMO. Not sure how pricing compares to redshift et al, but pricing is the easiest thing for Google to change.

link

manigandham 2092 days ago

Slots don't control how much data you consume, your query does.

If you need to read a terabyte of data to answer your query then more slots only gets it done faster.

link

statusgraph 2092 days ago

BQ Slots lets you do essentially that (pre-commit to a particular cluster size)

link