| HN Mirror

Y	Hacker News new \| ask \| show \| jobs


	by remilouf 2619 days ago
	I did use BigQuery in the startup I was working for before, and it worked wonders for our 12Tb of data. I think it would be a bit overkill in our situation---even though not having to manage a DB is great.

1 comments

sologoub 2619 days ago

That’s the beauty of BQ - it scales well, but it works just fine in smaller use cases. It doesn’t get simpler than SQL.

Another item to consider is that BQ now has ML (simpler) models built in, further reducing the complexity of your pipeline: https://cloud.google.com/bigquery/docs/bigqueryml-intro

If you are not on GCP, then I’d consider AWS Athena for querying the parquet files, but you still have to structure these efficiently beforehand.

link

remilouf 2618 days ago

I will consider that. How about Redshift?

link

aarbor989 2618 days ago

We had Redshift for our 23TB+ dataset and it worked great. The downside is it can get pricy, so do a cost analysis before you commit. Also know that views in redshift are not materialized so it’s more efficient to create physical tables of the views - which then adds maintenance overhead. The last thing I’ll add is that you’ll need to experiment with compression settings for your data. For us, a combination of ZSTD and bytedict was all we needed

link