Hacker News new | ask | show | jobs
by mrlucax 2280 days ago
Does anyone know of any good resources to learn about data streams in general? Some weeks ago I had to implemnt some streams (in nodejs) to upload a file to s3 "on the fly", without storing the file locally and then uploading it, but I culdn't wrap my head about the data stream concept.
3 comments

Have you reviewed data-intensive architectures? I've found that book quite useful.

I'm just talking out my butt right now but I think fundamentally, a stream is just a chunk of data lifted from persist into memory. I imagine a cursor process traversing some bytes in a file, and then lifting some of those bytes into memory, and sending that memory over network.

Yes, however I think there's a lot more complexity in modern steaming architectures: event sourcing, concurrency, eventual consistency, pub-sub, queueing, event handlers, microservices, stateless messages, data lakes - to name a few. I would also be interested in a resource that tackles major concepts here.
> data-intensive architectures

I suspect you mean __Designing Data-Intensive Applications__, by Martin Kleppman, but I am not entirely sure.

Yeah, I just didn't want to type it out :P
Check out dataflows in Composable for a quick way to implement these data streams. https://docs.composable.ai/en/latest/03.DataFlows/01.Overvie...
you can also use the upload multipart feature of s3. Basically, you buffer until you have a sizeable chunk of the incoming stream, then upload that chunk; iterate. At the end you tell s3 to concatenate the parts.