|
|
|
|
|
by seertaak
1691 days ago
|
|
> What is already being used for what? Let's say you have some data. You can choose to store it in a relational DB, like SQLite or DuckDB, or you can store it in a parquet file (and load it into an Arrow buffer). And the point is that if you combine Arrow with, say, Spark, then as a user you can accomplish something similar to what you might accomplish with a relational DB. But you don't need to hassle with setting up a DB server and maintaining it. All you need is a job that outputs a parquet file, and uploads that to S3. And then Spark - through Arrow! - will allow you to execute queries against that DB. Using Arrow + Spark, you get the ability to a dataframe as if it's SQL, but you can still do pandas-style stuff i.e. treat it as a dataframe. OTOH you lose the more esoteric SQL stuff like fancy constraints, triggers, foreign keys. |
|