Hacker News new | ask | show | jobs
by peepeepoopoo3 1260 days ago
I hate pandas with a burning passion, but one thing it does have going for it is (some) interoperability with numpy, which opens up the rest of the scipy ecosystem. How easy is it to get numpy arrays into and out of polars?
5 comments

Very easy.

`pl.from_numpy` and `series.to_numpy` are your friend here. For 1D columns, we often can be zero copy as well.

Besides that we support numpy ufuncs for `Series` and `Expressions`. As OP pointed out:

https://kevinheavey.github.io/modern-polars/performance.html...

Numpy can be used to speed up some functions by utilizing numpy ufuncs. Numpy drops the GIL and therefore they can still be executed in parallel.

An alternative I found recently is RedFrames [1] which wraps Pandas dataframes in a more consistent interface. That might be a better alternative if you need easy compatibility with Pandas.

[1] https://github.com/maxhumber/redframes

Though that does look slick, the project is only ~5 months old. Which is a bit young for me to jump aboard.
Seems RedFrames is similar to pyjanitor, which is maturer if only comparing existence time: https://github.com/pyjanitor-devs/pyjanitor
Oh that looks interesting
As simple as a call foo.to_numpy() it looks like.
what do you hate about pandas so much? I miss it dearly now that I don't use Python anymore
I'm not GP, but I find the pandas API incredibly inconsistent and difficult to remember how to do simple transformations. For example, it sometimes overloads operators because it doesn't use built in language features like lambdas. There are reasons for the inconsistency, but using the alternatives like R's tidyverse or Julia's DataFramess.jl is like night and day for me.

I found RedFrames [1] recently which wraps Pandas dataframes with a more consistent interface, it's probably what I'd use if I had to write data transformations that had to be compatible with Pandas.

[1] https://github.com/maxhumber/redframes

Pandas gets the job done, and is overall easy to use and intuitive.

The problem is that it's a huge pile of hacks, exceptions, anti patterns, and regressions.

The API is inconsistent, loose, full of obscure options added as quickfixes.

It really can't be said enough how pandas is a mess. It has way too much surface area and no common thread pulling it all together. This gets obvious when you work with better dataframe libs like dplyr [1] or DataFramesMeta [2]. I've worked on production systems with all of these libs, this is not gratuitous bashing.

[1] https://dplyr.tidyverse.org/ [2] https://juliadata.github.io/DataFramesMeta.jl/stable/

Funny seeing you here