|
|
|
|
|
by scaleout1
3681 days ago
|
|
As someone who has used Heron (along with MillWheel, Spark Streaming and Storm) I feel like this announcement is too late. The biggest thing Heron offer is raw scale but since they decided to use existing Storm API, it has the same shitty spout /bolt API that Storms offer. In contrast, Spark streaming/Flink/ Kafka Streaming are all offering map/flatmap/filter/sink based functional API. At twitter most teams used SummingBird on top of Heron to get the same functional API but summingbird didnt get a lot of traction outside twitter and I am not sure how actively maintained OSS version of summingbird is. Even if you bite the bullet and decide to use SB with Heron, you will still miss out on a lot of usecases as SB was mostly focused on doing read/transform/aggregate/write whereas most streaming problem that i have noticed outside of twitter involve doing read/transform/aggregate/decision/write. I suppose you can implement decisioning in SB but i havent seen it done. Comparing Heron to google millwheel is interesting because of the design choices they made. Heron support at least one and at most once message guarantees but at Twitter most job ran with acked turned off so it was at most once with acknowledged data loss ( they had a batchjob doing mop up work to pick up missing data). Google on the other hand implemented exactly once semantic by doing idempotent sinks/ watermarking and managing out of order messages plus deduping support. Since both Flink and Spark will be implementing Apache Beam (millwheel's predecessors) model, only reason I see someone picking heron instead of Flink/Spark is that they are operating at massive scale that flink/spark dont support yet |
|