Hacker News new | ask | show | jobs
by tmoertel 1294 days ago
Consider the monoid abstraction in the context of batch processing. Anywhere you have a monoid, you have a corresponding “banker’s law” that says you can find the generalized sum of a collection of items by partitioning the items into groups, computing the sum over each group, and then taking the sum of the sums as your final result. This idea has many applications in batch processing.

For example, in the MapReduce framework, this idea gives rise to “Combiners” that summarize the results of each map worker to massively lower the cost of shuffling the results of the Map stage over the network prior to the Reduce stage.

Another example: In distributed database systems, this idea allows many kinds of addition-like aggregations to be performed more efficiently by computing the local sum for each group under the active GROUP BY clause before combining the groups’ subtotals to yield the wanted gobal totals.

Basically, in any situation in which you have to compute a global sum of some kind over a collection of items, and some of those items are local to each worker, you can compute the global sum faster and with fewer resources whenever you can take advantage of a monoidal structure.

1 comments

Which is exactly the concept between optimizing for string concatenation and interning them, ultimately. Sure you can make do with the "algebraic" definition of a monoid, or entirely without it, but that doesn't mean the abstraction isn't there to inform your thinking and your research.

One point that really stuck with me is how people who apply category theory (say, people at the topos institute) use the concepts as little tools, and everytime a problem crosses their way, they try out all the little tools to see if one of them works, similar to how Feynman describes carrying out problems in his head until one day a technique unlocks something).

Having more general abstractions just allows them to be applied to more problems.