|
|
|
|
|
by dagss
510 days ago
|
|
I agree that for a) inserts and b) single entity access -- in both these cases the backend can do a lot of the preparation. And your example is both a) and b). We are then in O(1) optimization territory. If you are only processing a single entity -- the backend should tell the DB exactly what to do. And one shouldn't have if-statements in SQL of course that is "doing it wrong". But if you have a chunk of 10000 entities like that in your example, all of different types, then you will have to insert some subset of data into all those tables (1000 in one tables, 500 another table, and so on). That logic is well suited for where conditions without much overhead. But yes for inserts most of the logic can usually be shifted to the DB client as that is where the data resides already. The problem I was talking about was meaningfully shifting for to the client for queries, where the client has no data to work with and must fetch it from the DB. Let us take your example and turn it into "fetch 10000 such objects". Fetching the right rows for all of them at once using joins and where conditions (+temporary tables and multiple return sets in the same query roundtrip) is going to be more efficient for the DB than the backend first fetching the type, then branching on type, then fetching from another table and so on. |
|
Nope, not if done correctly.
Now, this isn't to say there's not valid reasons to do it all at once in the DB, the chief among them being ACID requirements. However, from an efficiency standpoint both for the application and the DB the most efficient action is to first request from the parent table and then turn around and, in parallel, send out requests for the child tables of the various types as needed.
Assuming you have a connection pool, the overhead of doing multiple requests in parallel is small. The DB has less data to lookup. The DB has less temporary memory to store (which in our case was a problem). The response io is smaller (not a bunch of empty columns sent back) and both the DB and the downstream application are capable of querying against these tables in parallel.
There is a latency downside in needing the load up the parent table first, if the datasize is large enough then you could overcome that problem by making batch requests to the DB as the parent dataset comes back. Say every 1k values of a given type start the parallel request to load that data.
Splitting the request into these smaller and parallel requests also has systemic benefits to the DB, new writers are able to sneak in which isn't possible when you try to do everything at one go (another issue we had).
The added benefit here is the optimizer in the DB is more likely to do the right thing for the subtable requests than it is for the temp table mess request. A simple fetch is far easier to optimize than a complex one.