|
|
|
|
|
by ignoreusernames
517 days ago
|
|
> Spark is "in-memory" in the sense that it isn't forced to spill results to disk between operations I see your point, but that's only true within a single stage. Any operator that requires partitioning (groupBys and joins for example) requires writing to disk > [...] which used to be a point of comparison to MapReduce specifically. So each mapper in hadoop wrote partial results to disk? LOL this was way worse than I remember than. It's been a long time that I've dealt with hadoop > Not ground-breaking nowadays but when I was doing this stuff 10+ years I would say that it wouldn't be ground breaking 20 years ago. I feel like hadoop influence held up our entire field for years. Most of the stuff that arrow made mainstream and is being used by a bunch of engines mentioned in this thread has been known for a long time. It's like, as a community, we had blindfolds on. Sorry about the rant, but I'm glad the hadoop fog is finally dissipating |
|