Hacker News new | ask | show | jobs
by nojito 1798 days ago
40% slower in groupbys and 4x slower in joins isn’t convincing.
2 comments

Oh I agree. What's convincing to me is the momentum. The DataFrames.jl team only started focusing on performance three months ago after hitting v1.0[1] and were able to rapidly become competitive with groupbys; the performance of join is next[2]. Compare the live view with the state when grandparent's blog post was written/updated (March of this year).

I expect it to continue to improve; note that it's starting to be the fastest implementation on some of the groupby benchmarks.

1. https://discourse.julialang.org/t/release-announcements-for-...

2. https://discourse.julialang.org/t/the-state-of-dataframes-jl...

This seems quiet cherry picked as there are 3 different dataset sizes.

However yes, it does not beat all other packages tested in performance.

Not really cherry picked. Data.table is designed for large data sets with many groups + complex joins.