|
|
|
|
|
by ferrerabertran
5220 days ago
|
|
Hi haberman, I'm one of the developers of Pangool. Let me try to clarify why we stated that. I understand it may sound aggresive. Pangool is based on an extension of the MapReduce model we suggest and call "Tuple MapReduce". This is explained in detail in this post: http://www.datasalt.com/2012/02/tuple-mapreduce-beyond-the-c... What this means is that in Pangool, if you worked with 2-sized Tuples, you would be able to do exactly the same that you do now with Java MapReduce - That includes custom RawComparators and arbitrary business logic in any place of the MapReduce chain (Mapper, Combiner, Reducer). Using n-sized Tuples together with Pangool's group & sort by, reduce-side join API will only mean less code, easier code at no loss of performance or flexibility. Realize that Pangool is still a MapReduce API so it doesn't add any level of abstraction. We designed Pangool with the aim of offering it as a replacement of the current MapReduce API. Therefore we are not labelling it as a "higher-level API" but as comparable low-level API. On the other hand we are also benchmarking Pangool to show it doesn't impose a performance overhead: http://pangool.net/benchmark.html |
|
Also, since the data model is more complicated and provides more features, it takes more code and a more complex implementation. This could be significant if you were trying to port the model to another language or implementation, or were trying to formally things about the code or mathematical model, etc.
I'm not saying it's not cool; I actually think it's a good and powerful abstraction -- I just object to the characterization of "all features and no tradeoffs".