|
|
|
|
|
by ozataman
3172 days ago
|
|
There is a different side to the cost benchmark that's not captured by the description here. If your use case needs a lot of stored data but not necessarily a matching degree of
peak CPU (even if your query load is otherwise pretty consistent), Redshift will become really expensive really fast and it will feel like a waste. BigQuery will meanwhile keep costs linear (almost) in your actual query usage with very low storage costs. For example, you may need to provision a 20-node cluster only because you need the 10+ terabytes in storage across several datasets you need to keep "hot" for sporadic use throughout the day/week, but don't nearly need all that computational capacity around the clock. Unlike BigQuery, Redshift doesn't separate storage from querying. Redshift also doesn't offer a practically acceptable way to scale up/down; resizes at that scale take up to a day, deleting/restoring datasets would cause lots of administrative overhead and even capacity tuning between multiple users is a frequent concern. Making matters worse, it is common for a small number of tables to be the large "source of truth" tables that you need to keep around to re-populate various intermediate tables even if they themselves don't get queries that often. In Redshift, you will provision a large cluster just to be able to keep them around even though 99% of your queries will hit one of the smaller tables. That said, I haven't tried the relatively new "query data on S3" Redshift functionality. It doesn't seem quite the equivalent of what BigQuery does, but may perhaps alleviate this issue. Sidenote: I have been a huge Redshift fan pretty much since its release under AWS. I do however think that it is starting to lose its edge and show its age among the recent advances in the space; I have been increasingly impressed with the ease of use (including intra team and even inter-team collaboration) in the BigQuery camp. |
|
Spectrum extends that even further, allowing you to have recent and reference data locally stored and keep archival data in S3 available for query at any time.