|
|
|
|
|
by potatote
4131 days ago
|
|
Can someone explain why one can't simply average the individual average results as the author wrote below: ""
No, we can't run averages on worker nodes, and then average those out. We need to have each worker node compute their sum(order_value) and count(order_value), and then sum(sum()) / sum(count()) on the coordinator node.
""? Thank you. |
|
Division is not commutative, as the article says. A simple example referring to the article's diagram of boxes:
orders_2013 has sum(price) = 10, with 3 records
orders_2014 has sum(price) = 11, with 5 records
orders_2015 has sum(price) = 31, with 7 records
Average on each node, and average them:
( (10/3)+(11/5)+(31/7) ) / 3 = 3.32063492063
Sum the price individually on each node, take the counts on each node, sum them on the master node, and divide on the master node:
(10+11+31)/(3+5+7) = (10+11+31)/15 = 3.46666666667
hence, running division on each node is not the same as finding the division across all orders. (replace my use of division with "average" and it's the same concept).