I find Pandas vs. SQL to be complimentary, rather than an either-or type situation. For anything in the tens of GB range or smaller, it’s easy enough to move between the two with read_sql_query and to_sql.
The general strategy is to build the core of any dataset as a SQL query that handles joins and performance-sensitive parts of the query, then polish/plot/yeet into weird shapes with Pandas since it offers much greater expressivity.
What bugs me about pandas is that it is so copy heavy. I just wanted to know if there was some pythonic way to get performance without just writing normal SQL
The general strategy is to build the core of any dataset as a SQL query that handles joins and performance-sensitive parts of the query, then polish/plot/yeet into weird shapes with Pandas since it offers much greater expressivity.