|
|
|
|
|
by haberman
5220 days ago
|
|
> There is no tradeoff, just advantages. Though I don't have deep expertise in Hadoop, I find this claim highly suspect. High-level APIs achieve user-friendliness by making decisions/assumptions about the way a lower-level API will be used. I would be very surprised if there was no use case for which your API does impose a trade-off vs. the low-level Hadoop API. I feel much more confident using a high-level API if its author is up-front about what assumptions it's making. If the claim is that there is no trade-off vs. the low-level API, I generally conclude that the author doesn't understand the problem space well enough to know what those trade-offs are. I could be wrong, but this is my bias/experience. |
|
Pangool is based on an extension of the MapReduce model we suggest and call "Tuple MapReduce". This is explained in detail in this post: http://www.datasalt.com/2012/02/tuple-mapreduce-beyond-the-c...
What this means is that in Pangool, if you worked with 2-sized Tuples, you would be able to do exactly the same that you do now with Java MapReduce - That includes custom RawComparators and arbitrary business logic in any place of the MapReduce chain (Mapper, Combiner, Reducer). Using n-sized Tuples together with Pangool's group & sort by, reduce-side join API will only mean less code, easier code at no loss of performance or flexibility.
Realize that Pangool is still a MapReduce API so it doesn't add any level of abstraction.
We designed Pangool with the aim of offering it as a replacement of the current MapReduce API. Therefore we are not labelling it as a "higher-level API" but as comparable low-level API.
On the other hand we are also benchmarking Pangool to show it doesn't impose a performance overhead: http://pangool.net/benchmark.html