|
|
|
|
|
by sigil
5503 days ago
|
|
This looks interesting. Questions: (1) What do you mean by a processing topology -- is this a data dependency graph? (2) How does one define a topology? Is this specified at deployment time via the jar file, or can it be configured separately and on the fly? (3) Must records be processed in time order, or can they be sorted and aggregated on some other key? |
|
2. To deploy a topology, you give the master machine a jar containing all the code and a topology. The topology can be created dynamically at submit time, but once it's submitted it's static.
3. You receive the records in the order the spouts emit them. Things like sorting, windowed joins, and aggregations are built on top of the primitives that Storm provides. There's going to be a lot of room for higher level abstractions on top of Storm, ala Cascalog/Pig/Cascading//Hive on top of Hadoop. We've already started designing a Clojure-based DSL for Storm.