|
|
|
|
|
by isoprophlex
1226 days ago
|
|
Quick, in-core data transformation. If you want to transform some data right now, one option is writing pyspark and running that on a spark cluster. But noone really has big big data, there are relatively few cases where you have multi TB datasets, warranting the complexities of running the analyics in a distributed way. DuckDB lets you process all that locally. It's the OLAP equivalent to SQLite's OLTP. If I wasn't so beholden to the vagaries and inefficiencies of C-level endorsed enterprise software, I'd immediately be trying this out for data transformations/pipelines. I think that one big box (200+ gb ram, couple of cores and fat IO/network) runs circles around an entire spark cluster. |
|
Is there a reason "in-core" is a specific requirement here?