Hacker News new | ask | show | jobs
by cuteboy19 1324 days ago
Why does pandas code often feel ugly and clunky compared to the equivalent SQL? Is there no better way to do this?
2 comments

I find Pandas vs. SQL to be complimentary, rather than an either-or type situation. For anything in the tens of GB range or smaller, it’s easy enough to move between the two with read_sql_query and to_sql.

The general strategy is to build the core of any dataset as a SQL query that handles joins and performance-sensitive parts of the query, then polish/plot/yeet into weird shapes with Pandas since it offers much greater expressivity.

What bugs me about pandas is that it is so copy heavy. I just wanted to know if there was some pythonic way to get performance without just writing normal SQL
Any specific examples you have in mind?