Hacker News new | ask | show | jobs
by mwc360 543 days ago
Author of the blog here: fair point. Pretty much every published benchmark has an agenda that ultimately skews the conclusion. I did my best here to be impartial, I.e I fully designed the benchmark and each test prior to running code on any engine to mimic typical ELT demands w/o having the opportunity to optimize Spark since I know it well.
1 comments

I think you did a good job for these workloads. I did some informal experimenting last year when I had to implement an ELT-type system and I ended up doing it in Spark as well. It was my last choice, because I find operating and debugging Spark to be a huge pain. But everything else I tried was way slower.

I didn't think that people used polars a lot for ELT. I've usually seen it used for aggregations with small outputs (which, as you called out, it does a great job at).