Hacker News new | ask | show | jobs
by posco 4483 days ago
Quick point:

1) Scalding has a DObject like type: ValuePipe[+T]. 2) The reason you must explicitly call .group to go to a keyed type is that is costly to do a shuffle, this makes it clear to people when they do trigger a shuffle. If you don't like that, make an implicit def from TypedPipe[(K, V)] to Grouped[K, V] 3) You can easily use scalding as a library, but most examples use our default runner. We use it as a library in summingbird. But you are right, a nice doc to help people see what to do might help people (hint: set up an implicit FlowDef and Mode, do your scalding code, then call a method to run the FlowDef).

1 comments

1) Ah, the ValuePipe is (relatively) new; thanks for the pointer. 2) You have to explicitly `.group` in Scoobi as well; it transforms a DList[(K,V)] to a DList[(K, Iterable[V])] or similar. You don't have to call `.toTypedPype` to get map and friends, though, since it's just a DList. 3) I've actually written this exact integration, so I'm glad it's the approved method! The global, mutable Mode made me nervous, IIRC.
The global Mode is gone in 0.9.0. And there is an implicit from Grouped to TypedPipe, so you don't need to call .toTypedPipe (that directly seems less likely to cause problems, especially given we have mapValues and filter on Grouped, so we should avoid needlessly leaving the Grouped representation).
Neat! It looks like things are changing fast; I'll have to do another read-through.