Hacker News new | ask | show | jobs
by ramraj07 2195 days ago
When you say cloud-native data warehouses do you mean things like snowflake/redshift/big-query or something else? As part of an org making the transition from spark to these I can definitely agree that these tools are better suited for practical data engineering in the medium-big-data scale (anything not Google/Facebook)
1 comments

I was thinking AWS Athena (Presto) for the data warehouse and AWS Glue (Spark) for ETL. Redshift has always had the feel of a Column Store Appliance that runs side-by-side with your other IaaS resources. There is nothing particularly cloud-native about it other than the way it is provisioned and managed in the AWS web Console. Amazon QuickSight seems like an excellent alternative to Enterprise BI pivot tables like Tableau, Excel, PowerPivot, Business Objects, and Cognos. Amazon seems to be ahead of the competition (again) when it comes to ETL/DW/BI-as-a-Service, at least in terms of price-per-performance.

I don't know anything about Snowflake. SQL makes BigQuery and Hive easier to program than MapReduce/Pig but I don't think of these technologies as data warehouses.

Column Stores (compressed bitmap indexes batch updated with an ETL-like process) make exceptional data warehouses. Row oriented data warehouses all feel like anachronisms now.