|
|
|
|
|
by craydandy
543 days ago
|
|
Interesting and well-written article. Thanks to the author for writing it. Replacing Spark with these single-machine tools seems to be on the hype, and Spark is not en vogue anymore. The author ran Spark in Fabric, which has V-Order write enabled by default. DuckDB and Polars don't have this, as it's an MS proprietary algorithm. V-Order adds about 15% overhead to write, so it does change the result a bit. The data sizes were bit on a large size, at least for the data amounts I see daily. There definitely are tables in the 10GB, 100GB, and even in 1TB size range, but most tables traveling through data pipelines are much smaller. |
|