Hacker News new | ask | show | jobs
by boredandroid 4377 days ago
I am the author of the post.

I think in cases where you are running totally different computations in different systems the Lambda architecture may make a lot of sense.

However one assumption you may be making is that the stream processing system must be limited to non-blocking, in-memory computations like sketches. A common pattern for people using Samza is actually to accumulate a large window of data and then rank using a complex brute force algorithm that may take 5 mins or so to produce results.

One of the points I was hoping to make is that many of the limitations people think stream processing systems must have (e.g. can never block, can't process large windows of data, can't manage lots of state) have nothing to do with the stream processing model and are just weaknesses of the frameworks they have used.

1 comments

>A common pattern for people using Samza is actually to accumulate a large window of data and then rank using a complex brute force algorithm that may take 5 mins or so to produce results.

I think the argument eventually comes down to what people mean by "batch" and "stream". Some people might describe the aforementioned Samza use case as (micro-)batch as opposed to stream processing.

All in all, I found your post insightful. If a system with fewer moving parts can handle the same data processing requirements, that will be appealing to the majority of users.

Yes, totally. In my definition the difference is that a stream processing system let's you define the frequency with which output is produced rather than forcing it to be "at the end of the data". This doesn't preclude blocking operations.