|
|
|
|
|
by evancasey
4053 days ago
|
|
Huge spark fan here. Love the execution model, API, supporting libs etc. Unfortunately, Spark doesn't scale well on large datasets (10TB+). Sure, it's possible (and has been done), but right now there are too many rough edges to make it a better choice than Scalding/Cascading for data processing at scale. Most of this boils down to fine tuning certain Spark parameters, which is a pain when you're dealing with long-running, resource intensive workflows. |
|