Hacker News new | ask | show | jobs
by vbilopav 1150 days ago
I don't understand it either.

Author seems to be arguing against long functions/procedures. But if you move that to the client, presumably with ORM support - you're going to be executing more or less the same sequence of SQL queries and commands. Only difference is that when doing it on client you will have a lot of latency.

Yes, you can cache some data in between those commands to avoid same multiple queries, but if you use temp tables to do so, they will use memory only if it is available, otherwise you are limited with the actual storage size.

Only time I had memory issues with PostgreSQL when I used too much data in arrays and json's. Those are memory only variables. For example, I'd return a huge json to client and I'd run out of memory on PostgreSQL. I started streaming row by row and problem solved.

3 comments

Fwiw the specific case which motivated that section in the post was a set of recursive functions we used to denormalise an irregular graph structure (so not suitable for CTE) into a single blob of JSON to be sent to another data store. 99% of the time there were no issues with this but at times of heavier load and on complex subgraphs, those recursive call stacks contributed to severe replication lag on the replicas they were running on.

Moving the traversal logic into the application and just sending SQL queries to Postgres (we don't use an ORM) eliminated the lag. RTT between the application and the db was a few ms and this wasn't user-facing logic anyway, so extra latency wasn't an issue in this case.

Probably the fundamental problem here was a sub-optimal schema, but sometimes you're just working with what you've got. Plus a commenter on Reddit pointed out that if we used pure SQL functions instead of PL/pgSQL, we'd also have seen better performance then.

"Probably the fundamental problem here was a sub-optimal schema, but sometimes you're just working with what you've got. Plus a commenter on Reddit pointed out that if we used pure SQL functions instead of PL/pgSQL, we'd also have seen better performance then."

So, would the better advice not have been to use simpler SQL instead of complex recursive statements, instead of taking a drastic approach to abandon ship (move logic to a completely new layer)?

Also, if you're doing string concats manually for your Json, this might cause some overhead for larger objects. ??

> So, would the better advice not have been to use simpler SQL instead of complex recursive statements, instead of taking a drastic approach to abandon ship (move logic to a completely new layer)?

Probably, yep! But I didn't know that when I wrote it.

I didn't want to give any concrete advice at all tbh. The entire rationale for the post was that I'm not an expert and I've broken prod in some surprising ways and if I share those ways maybe it will stop other people making similar mistakes in future. But I guess I over-stepped in my discussion for this mistake, sorry about that.

> Also, if you're doing string concats manually for your Json, this might cause some overhead for larger objects. ??

Good point, I hadn't considered that part of it. It wasn't string concats, we were building it with `jsonb_set`, but I can definitely see the JSON structure in memory as being part of the problem now you mention it (although maybe that reinforces the argument for doing it in the application layer).

I’m fully against using triggers to implement business logic, but find stored procedures can be great for encapsulating some elements of application/business logic. I’ve got several applications that access the same databases, and putting logic in the database avoids needing to duplicate it in clients.

Most comments about debugability are nonsense. It’s just different, with some pros and cons. One simple example - if you have a bug in production, you’re not going to attach a debugger to your production application. But you can absolutely open a readonly connection to the database and start running queries, invoking functions, etc. It helps if you can architect your functions to distinguish pure readonly functions from those with side effects, but you can still debug even if that’s not the case.

Probably because you can't do proper testing as easy as application code. And debugging is much harder.
I think I’m this is a commonly stated fact, but I don’t find it particularly true. Like any other technology, you just need to put in some initial effort to set up your test framework. In the case of PostgreSQL, pgTAP does a great job.
It is commonly stated and I found it to be very true. PostgreSQL is quite advanced in its procedural aspects (Oracle isn't too far behind either) but they were not made with particular focus on debugging. I'll need to have hacks like creating temp tables to dump records at a given stage vs simply setting a breakpoint. I can unit test the shit out of bog standard Java code; PL/SQL for all its capabilities doesn't even come close. The one area this tilts to the other side is when you need to do heavy processing with high volume of data on database side; a well written stored proc would handily outperform application side logic simply due to the network latency involved. But for typical use cases, putting complex business logic in stored procs just isn't worth it.
I disagree on both points.

Edit: but I was referencing specific performances claims, that you will somehow take some load of database server. I just don't see it.

The context here was that it’s not free, as I understood it. So, moving logic to the database, might make it faster, but that doesn’t mean that it’s instantaneous or that I no longer have to think about the scaling concerns of it.

So, personally, I read that section as “logic in the database is not a zero cost abstraction.