In my opinion, pandas is fundamentally broken and unsuitable for any production workload.
The heavy lifting should be left to a RDBMS like you say: something with a sensible, battle-hardened query planner. I've written and debugged too many lines of manual pd joins/merges; something declarative like SQL is much nicer because the query planner is almost always right.
Furthermore, as a user, I've always found the pandas API to be very confusing. I'm always having to interrupt my workflow to figure out boring details about the API (is it df.groupBy().rolling(center=True).median() or any other permutation?), whereas eg pyspark or sql are so much more ergonomic.
Finally, typing inside pd dataframes is a complete and utter nightmare. Int64 missing a null, or the idiocy around datetimes expressed as epoch nanoseconds...
Pandas is nice for noodling around in notebooks. But for me, it should never be used beyond that.
The heavy lifting should be left to a RDBMS like you say: something with a sensible, battle-hardened query planner. I've written and debugged too many lines of manual pd joins/merges; something declarative like SQL is much nicer because the query planner is almost always right.
Furthermore, as a user, I've always found the pandas API to be very confusing. I'm always having to interrupt my workflow to figure out boring details about the API (is it df.groupBy().rolling(center=True).median() or any other permutation?), whereas eg pyspark or sql are so much more ergonomic.
Finally, typing inside pd dataframes is a complete and utter nightmare. Int64 missing a null, or the idiocy around datetimes expressed as epoch nanoseconds...
Pandas is nice for noodling around in notebooks. But for me, it should never be used beyond that.