Hacker News new | ask | show | jobs
by brahbrah 1259 days ago
I like polars a lot. It’s better than pandas at what it does. But it only accounts for a subset of functionality that pandas does. Indexes are not just some implementation details of dataframes. They are fundamental to the representation of data in a way where dimensional structure is relevant. Polars is great for cases where you want to work with data in “long” format, but that’s not always the most convenient way to work with data. Let’s say you want to get the difference in 15 day ahead temperature forecasts between forecasts on 2 different mark dates, for the forecast days they overlap (say the data consists of forecasted date, country, state equivalent, temp). In long format (necessarily in polars, optionally in pandas) you have to do:

    Merge df 1 and 2 on country, state and forecasted date, then create a new column of the diff between the 2 temp columns, then drop the 2 original temp columns. 
In a format where your indexes are forecasted dates on the rows and multiindex of country, state on the columns, you just have to do:

    df1 - df2
The way I see pandas is a toolkit that lets you easily convert between these 2 representations of data. You could argue that polars is better than pandas for working with data in long format, and that a library like xarray is better than pandas for working with data in the dimensionally relevant structure, but there is a lot of value in having both paradigms in one library with a unified api/ecosystem.