|
|
|
|
|
by nchammas
1939 days ago
|
|
> If you process data one row at a time, that is clearly a streaming pipeline, but most systems that call themselves streaming actually process data in small batches. From a user perspective, it’s an implementational detail, the only thing you care about is the latency target. Author here. 100% agreed. As an aside, I just came across your post about how Databricks is an RDBMS [0]. I recently wrote a similar article from a slightly more abstract perspective [1]. Having worked heavily with RDBMSs in the first part of my career, I feel like so many of the concepts and patterns I learned about there are being re-expressed today with modern, distributed data tooling. And that was part of my inspiration for this post about data pipelines. [0] https://fivetran.com/blog/databricks-is-an-rdbms [1] https://nchammas.com/writing/modern-data-lake-database |
|