| HN Mirror

Y	Hacker News new \| ask \| show \| jobs

by ignoreusernames 517 days ago

> Spark is "in-memory" in the sense that it isn't forced to spill results to disk between operations

I see your point, but that's only true within a single stage. Any operator that requires partitioning (groupBys and joins for example) requires writing to disk

> [...] which used to be a point of comparison to MapReduce specifically.

So each mapper in hadoop wrote partial results to disk? LOL this was way worse than I remember than. It's been a long time that I've dealt with hadoop

> Not ground-breaking nowadays but when I was doing this stuff 10+ years

I would say that it wouldn't be ground breaking 20 years ago. I feel like hadoop influence held up our entire field for years. Most of the stuff that arrow made mainstream and is being used by a bunch of engines mentioned in this thread has been known for a long time. It's like, as a community, we had blindfolds on. Sorry about the rant, but I'm glad the hadoop fog is finally dissipating