Hacker News new | ask | show | jobs
by martinsmit 1162 days ago
I think DF.jl works remarkably well as a "fits in RAM" dataframe backend, but I think it just lacks in usability and integratedness with the wider Julia ecosystem. Or rather, the wider ecosystem isn't as mature in key data analysis areas.

In particular, as you mention, plotting is one of the evolving parts of the ecosystem. Plots.jl is fine, Makie is powerful but very DIY, and AoG is slick but unwieldy. ggplot2 is far from perfect, but it works so well due to its maturity and integration with the rest of the Tidyverse.

In my ideal world, there would be a DataFrames.jl wrapper to provide nice (not just nicer like the two DFM.jl packages) syntax, and a powerful high-level plotting package (Makie is powerful but syntax is low level, Plots is mid on both) which is heavily integrated with the wrapper package.

Admittedly, I'm not a data scientist (anymore) so I don't follow the new developments in the dataviz scene much. If something like this exists then I would love to find it.

I wonder what my ideal syntax would look like anyway. Maybe something close to Tidyverse but with symbols as column names `:col_name` for ambiguity reasons.

2 comments

I'd agree with all of this. It's not so much the presence of ecosystem, it's the maturity of the ecosystem and knowing how to navigate it. Lots of competing approaches in a relatively small community makes me a little nervous.

If people are curious, DataFrames.jl has done a fair bit of work in consolidating a list of other packages that complement DataFrames.jl:

https://dataframes.juliadata.org/latest/#DataFrames.jl-and-t...

I do have to say that DataFrames.jl itself is a pretty impressive bit of work, and Bogumił Kamiński deserves no small amount of credit. If Julia ends up stealing significant mindshare from R / Python data science communities it'll be because of of this work.

Also for the Julia curious, I'll say that I find the ergonomics and sensibilities of the DataFrames.jl ecosystem much more in line with R and the Tidyverse than Python/Pandas.

Bogumil is a truly outstanding member of the community and DataFrames.jl is an impressive, versatile package.

From my perspective, however, DataFrames.jl's power is what makes it quite unergonomic for me. As an example, take the `args => transformations => result` syntax for doing pretty much anything in DataFrames. It versatile, but the lack of rank polymorphism in Julia i.e. broadcasting/mapping has to be explicit (which is usually a good thing given that type polymorphism is Julia's whole schtick) means that the transformation syntax feels cumbersome.

It's not that I want everything rowwise by default, an option provided by DataFramesMacros.jl, it's that I want things to be rank polymorphic when it makes sense. Base R got this right, hell S got this right, and so the Tidyverse inherited it and it makes the package so much more ergonomic than it would otherwise be.

I cannot overstate how impressive DataFrames.jl is, but I have to caveat this with "but I really try to avoid using it if possible". It's a shame, but I just think R's laissez-faire hackability, which in many cases results in spaghetti code, works really well in the tabular programming world where ergonomics are king and performance is easy.

> In my ideal world, there would be a DataFrames.jl wrapper to provide nice (not just nicer like the two DFM.jl packages) syntax

Have you come across [Tidier.jl](https://github.com/TidierOrg/Tidier.jl) yet? It's a relatively recent package (to my understanding), but developing at a pretty rapid pace, and tries to be more ergonomic with its syntax.