Hacker News new | ask | show | jobs
by memset 819 days ago
Let me take a look - thank you!

So if I'm understanding, you actually read data directly from (say) S3? It isn't copied from S3 and stored locally (ie, a bunch of local .arrow files.)

(Apologies if I'm ignorant of the underlying tech - I think this is really cool and just trying to wrap my head around what happens from "I upload some data to S3" and "we get query results")

1 comments

Yep, pretty much. Right now filesystem^ sources are finite, scanning the target path at operator startup time and processing all matching files. This processing is done by opening an asynchronous reader, courtesy of the object_store crate.

^We call these Filesystem Sources/Sinks to match terminology present in other streaming systems, but I'm not in love with it.