Hacker News new | ask | show | jobs
by remilouf 2619 days ago
I did use BigQuery in the startup I was working for before, and it worked wonders for our 12Tb of data. I think it would be a bit overkill in our situation---even though not having to manage a DB is great.
1 comments

That’s the beauty of BQ - it scales well, but it works just fine in smaller use cases. It doesn’t get simpler than SQL.

Another item to consider is that BQ now has ML (simpler) models built in, further reducing the complexity of your pipeline: https://cloud.google.com/bigquery/docs/bigqueryml-intro

If you are not on GCP, then I’d consider AWS Athena for querying the parquet files, but you still have to structure these efficiently beforehand.

I will consider that. How about Redshift?
We had Redshift for our 23TB+ dataset and it worked great. The downside is it can get pricy, so do a cost analysis before you commit. Also know that views in redshift are not materialized so it’s more efficient to create physical tables of the views - which then adds maintenance overhead. The last thing I’ll add is that you’ll need to experiment with compression settings for your data. For us, a combination of ZSTD and bytedict was all we needed