| HN Mirror

In my opinion, pandas is fundamentally broken and unsuitable for any production workload.

The heavy lifting should be left to a RDBMS like you say: something with a sensible, battle-hardened query planner. I've written and debugged too many lines of manual pd joins/merges; something declarative like SQL is much nicer because the query planner is almost always right.

Furthermore, as a user, I've always found the pandas API to be very confusing. I'm always having to interrupt my workflow to figure out boring details about the API (is it df.groupBy().rolling(center=True).median() or any other permutation?), whereas eg pyspark or sql are so much more ergonomic.

Finally, typing inside pd dataframes is a complete and utter nightmare. Int64 missing a null, or the idiocy around datetimes expressed as epoch nanoseconds...

Pandas is nice for noodling around in notebooks. But for me, it should never be used beyond that.