Hacker News new | ask | show | jobs
by kiyoto 4377 days ago
>A common pattern for people using Samza is actually to accumulate a large window of data and then rank using a complex brute force algorithm that may take 5 mins or so to produce results.

I think the argument eventually comes down to what people mean by "batch" and "stream". Some people might describe the aforementioned Samza use case as (micro-)batch as opposed to stream processing.

All in all, I found your post insightful. If a system with fewer moving parts can handle the same data processing requirements, that will be appealing to the majority of users.

1 comments

Yes, totally. In my definition the difference is that a stream processing system let's you define the frequency with which output is produced rather than forcing it to be "at the end of the data". This doesn't preclude blocking operations.