|
|
|
|
|
by internet101010
586 days ago
|
|
DataFrames are easy to use, everyone knows how to use them, you can move fast, and it's easy to iterate and test differences between things, and reviewing the code is a breeze. That said, my team moved to polars about a year ago and we haven't looked back. |
|
I have the opposite opinion. In a previous codebase I fought hard to use dataclasses & type hinting where possible over dictionaries, because with dictionaries you'd never know what type anything was, or what keys were present. That worked nicely and it was much easier to understand the codebase.
Now I've been put on a Pandas project and it's full of mysterious
I just feel like we've gone back to the unreadability of dictionaries.Everything's just called "df", you never know what type anything is without going in and checking, the structure of the frames is completely opaque, they change the structure of the dataframe halfway through the program. Type hinting these things is much harder than TypedDict/dataclass, at least doing it correctly & unambiguously is. It's practically a requirement to shove this stuff in a debugger/REPL because you'd have no chance otherwise.
Sure, the argument is that I'm just in a bad Pandas codebase, and it can be done much better. However what I take issue with is that this seems to be the overwhelming "culture" of Pandas. All Pandas code I've ever read is like this. If you look at tutorials, examples online, you see the same stuff. They all call everything the same name and program in the most dynamic & opaque fashion possible. Sure it's quick to write, and if you love Pandas you're used to it, but personally I wince every time I look in a method and see this stuff instead of normal code.
Personally I only use Pandas if I absolutely need it for performance, as a last resort.