|
|
|
|
|
by hknmtt
893 days ago
|
|
actually i think you can also just average each chunk and then add it to existing data. like read N rows(say all have one location to keep it simple), average the data from the chunk, update/save min and max, move on to next chunk, do the same but now update the average by adding to existing/previously computed average and divide by two. the result will be the same - disk IO will be the most limiting aspect. this "challenge" is not really a challenge. there is nothing complicated about it. it just seems "cool" when you say "process 1 billion rows the fastest you can". |
|
I.e. avg of {22.5, 23, 24} = 23.17... But:
1. 22.5
2. (22.5 + 23)/2 = 22.75
3. (22.75 + 24)/2 = 23.375