|
|
|
|
|
by chrishynes
2422 days ago
|
|
The problem isn't the cursor itself, but that it is usually a symptom of procedural thinking vs set based thinking. In an RDBMS it's typically far faster to puzzle out the joins, CTE's, and set based expressions and functions to use to winnow down a dataset vs a cursor based procedural logic on a row by row basis. |
|
Without disagreeing with any of the above, one important consideration is what you're going to do with the query. If all you want to know is a column's mean or some other simplified statistical value, there's really no sense in pulling all the data into Python just to calculate it. Do it inside the DB itself with SQL.
On the other hand, if you need that data to do other work (i.e. populate the table in a webpage, or generate a new descriptive data set or whatever), then the trade-off for pulling it into Python/pandas and running a mean in addition to the other work becomes much smaller.
My approach is usually to do as much data filtering and parsing as possible inside SQL, but things like complex parsing and string manipulation (especially!) I'll do with Python. I can do some simple string work in SQL, but I can almost always do it faster and cleaner in Python.