| HN Mirror

Y	Hacker News new \| ask \| show \| jobs


	by phumbe 2417 days ago
	What is a better approach? I have Python that directly connects to an Oracle database, and the Oracle blog tutorial[0] for using their Python package always uses a cursor. [0] https://blogs.oracle.com/oraclemagazine/perform-basic-crud-o...

2 comments

marcosdumay 2417 days ago

I imagine none of you or the GP is talking about in-database cursors, that you open in SQL, use on the same SQL script, close at the end of the script and move along. There isn't really a problem inherent to those, and they stay non-problematic if you are writing your database scripts on Python, C, or whatever.

For a server reading and writing from/on a database, you paginate your queries. On Postgres that would be using "limit" and doing small bulk inserts, but under Oracle your options for inserting are limited, so there is still value on cursors.

There is the odd occasion where you'll will iterate through the entire results set, do something fast for each and every row, and only get something useful on the end. Those are classic problems where cursors are more efficient than the alternatives. But even on those, once you add error handling and recovering may yield better results with pagination.

link

ansgri 2417 days ago

They talk about client-side cursors, these are part of the Python DB API. What you shouldn't use is server-side cursor, which you create with DECLARE in PosgreSQL. These keep their state on the server and are intended for optimizations for special cases like streaming data processing or realtime updates. Basically, for deep internals of realtime systems, and not regular queries.

link

goatinaboat 2417 days ago

What you shouldn't use is server-side cursor, which you create with DECLARE in PosgreSQL.

That entirely depends on the DB. Oracle for example always executes with a server-side cursor; all declaring it does is give you a handle to what it was doing anyway.

link

chrishynes 2417 days ago

The problem isn't the cursor itself, but that it is usually a symptom of procedural thinking vs set based thinking. In an RDBMS it's typically far faster to puzzle out the joins, CTE's, and set based expressions and functions to use to winnow down a dataset vs a cursor based procedural logic on a row by row basis.

link

bshipp 2417 days ago

Thank you for clarifying that; I was wondering where the concern was coming from.

Without disagreeing with any of the above, one important consideration is what you're going to do with the query. If all you want to know is a column's mean or some other simplified statistical value, there's really no sense in pulling all the data into Python just to calculate it. Do it inside the DB itself with SQL.

On the other hand, if you need that data to do other work (i.e. populate the table in a webpage, or generate a new descriptive data set or whatever), then the trade-off for pulling it into Python/pandas and running a mean in addition to the other work becomes much smaller.

My approach is usually to do as much data filtering and parsing as possible inside SQL, but things like complex parsing and string manipulation (especially!) I'll do with Python. I can do some simple string work in SQL, but I can almost always do it faster and cleaner in Python.

link