Hacker News new | ask | show | jobs
by bllchmbrs 975 days ago
DataFrames are just SQL. There will be no performance difference.

RDDs will be worse, so it shouldn't matter. No vectorization, no column processing, lots of serialization and de-serialization. They're basically always slower than DataFrames barring some strange use case.