|
|
|
|
|
by coffeemug
4231 days ago
|
|
Not the author of the project but I can think of two reasons. Firstly, you can think of map/reduce as the infrastructure for higher level operations (sort of like the assembly language of large scale data processing that higher-level data processing systems compile to). A breakthrough in the quality of the operational engine significantly impacts the experience of doing higher-level work, so if someone finds a better way to run map/reduce jobs, it's a win for everyone. Shipping jars instead of docker containers, and not having snapshots are serious drawbacks in the existing map/reduce infrastructure that significantly impact users in negative ways. Secondly, an easier way to specify map/reduce jobs (via a simple web server that exposes API endpoints to do data grouping, mapping, and reduction) is a dramatically simpler, more composable way to expose map/reduce jobs. Building higher level infrastructure on top of this abstraction is an order of magnitude easier than doing it on top of Hadoop, so it could be a better underlying platform for the generalization work being done in the community. |
|