Hacker News new | ask | show | jobs
by hello_moto 1681 days ago
It'll take a few more years until these companies fixed all the bugs and address all the scalability issues.

As of today, these companies are not good enough to take on the Data Warehouse part.

1 comments

Spark has always been able to handle way larger scale than any DW.
Handle what though?

Can Spark queries 100Bn structured data performing aggregation on multiple fields (or dimension?)

In my previous company, we had 63 petabytes of data in Snowflake.
That sounds great: storage problem is solved.

What about large scale read via OLAP queries (y'know, the typical measures and dimensions)

That's a respectable amount for a DW, true. Spark and it's ilk are designed for much larger scales though. Multiple FAANG use cases for Spark are in the petabytes per week range.