| HN Mirror

Y	Hacker News new \| ask \| show \| jobs

by hobs 1825 days ago

There's certainly some of that and I have experienced project managers asking me to put 5GB datasets in spark... but there's definitely a set of problems where vertical scaling is a PITA and MPP basically generally breaks the SQL guarantees anyway, costs a milli, requires rewrites, etc.

When you want to process N+1 TB/PB its hard to throw standard relational approaches at it imo.

SQL is strings all the way down, testing the database itself is often shitshow...

1 comments

taeric 1825 days ago

While I agree that it can easily be "strings all the way down", as often the way folks make spark testable is only slightly more advanced than using views in a sql world. Add in an understanding of windowing functions, and some trivial assertions on expected query results go a long way.

link