Hacker News new | ask | show | jobs
by hadrianpaulo 1771 days ago
Interesting but how is this as an alternative to Apache Flink's stream processing model?
3 comments

A big difference is the removal of windowing: Flink lets you aggregate or join events only so long as they arrive in the same temporally-bound window. You're required to have a window, and it's core to the semantics of your workflow.

Flow's model doesn't use windows, and allows for long-distance (in time) joins and aggregations. There's no concept of "late" data in Flow: it just keeps on updating the desired aggregate.

You do not need windowing. You can do everything you want with regular KeyedStream. You can join not-windowed streams using IntervalJoin.

This is if you want to use high level API. If you use lower-level ProcessFunction you have even more flexibility.

dataflow things like Flink (or even better differential datatflow [0]) are far more flexible and subsume map-reduce. This article feels like hyping up the durability of the Model T.

[0] https://github.com/TimelyDataflow/differential-dataflow

IANAE on Flink, especially when it comes to the internals. But I think that the decomposition of computations into distinct map and reduce functions seems to afford a bit more flexibility, since it can be useful to apply reductions separately from map functions, and vice versa. For example, you could roll up updates to entities over time just with a reduce function, and you could easily do so eagerly (when the data is ingested) or lazily (when the data is materialized into an external system). That type of flexibility is important when you want a realtime data platform that needs to serve a broad range of use cases.