Hacker News new | ask | show | jobs
by IvanVergiliev 3821 days ago
Reduce can perform reductions on locally on each machine before shuffling the data. This decreases the memory as well as the network overhead. If you need all the elements for a given key - e.g. to display them to a user or save them to a DB, perhaps you should use groupBy. If you're going to perform some form of a reduce after that though, it's likely sub-optimal.