| HN Mirror

Y	Hacker News new \| ask \| show \| jobs


	by peepeepoopoo3 1260 days ago
	I hate pandas with a burning passion, but one thing it does have going for it is (some) interoperability with numpy, which opens up the rest of the scipy ecosystem. How easy is it to get numpy arrays into and out of polars?

5 comments

ritchie46 1260 days ago

Very easy.

`pl.from_numpy` and `series.to_numpy` are your friend here. For 1D columns, we often can be zero copy as well.

Besides that we support numpy ufuncs for `Series` and `Expressions`. As OP pointed out:

https://kevinheavey.github.io/modern-polars/performance.html...

Numpy can be used to speed up some functions by utilizing numpy ufuncs. Numpy drops the GIL and therefore they can still be executed in parallel.

link

Hasnep 1260 days ago

An alternative I found recently is RedFrames [1] which wraps Pandas dataframes in a more consistent interface. That might be a better alternative if you need easy compatibility with Pandas.

[1] https://github.com/maxhumber/redframes

link

fbdab103 1260 days ago

Though that does look slick, the project is only ~5 months old. Which is a bit young for me to jump aboard.

link

yuuuxt 1260 days ago

Seems RedFrames is similar to pyjanitor, which is maturer if only comparing existence time: https://github.com/pyjanitor-devs/pyjanitor

link

atoav 1260 days ago

Oh that looks interesting

link

mbernstein 1260 days ago

As simple as a call foo.to_numpy() it looks like.

link

culi 1260 days ago

what do you hate about pandas so much? I miss it dearly now that I don't use Python anymore

link

Hasnep 1260 days ago

I'm not GP, but I find the pandas API incredibly inconsistent and difficult to remember how to do simple transformations. For example, it sometimes overloads operators because it doesn't use built in language features like lambdas. There are reasons for the inconsistency, but using the alternatives like R's tidyverse or Julia's DataFramess.jl is like night and day for me.

I found RedFrames [1] recently which wraps Pandas dataframes with a more consistent interface, it's probably what I'd use if I had to write data transformations that had to be compatible with Pandas.

[1] https://github.com/maxhumber/redframes

link

Galanwe 1260 days ago

Pandas gets the job done, and is overall easy to use and intuitive.

The problem is that it's a huge pile of hacks, exceptions, anti patterns, and regressions.

The API is inconsistent, loose, full of obscure options added as quickfixes.

link

pupperino 1260 days ago

It really can't be said enough how pandas is a mess. It has way too much surface area and no common thread pulling it all together. This gets obvious when you work with better dataframe libs like dplyr [1] or DataFramesMeta [2]. I've worked on production systems with all of these libs, this is not gratuitous bashing.

[1] https://dplyr.tidyverse.org/ [2] https://juliadata.github.io/DataFramesMeta.jl/stable/

link

topaz0 1260 days ago

Funny seeing you here

link