I don't think I'll put much effort in this project since this is barely a proof of concept but there's a couple of points I would like to dig if I ever find time:
1. How to reduce collections generations?
2. How to take advantage of the machine cores to execute the Map/Reduce operations?