Hacker News new | ask | show | jobs
by weliveindetail 828 days ago
> There is one way to make the LLVM JIT compiler more usable, but I fear it’s going to take years to be implemented: being able to cache and reuse compiled queries.

Actually, it's implemented in LLVM for years :) https://github.com/llvm/llvm-project/commit/a98546ebcd2a692e...

2 comments

Yeah, well, sorry, I should have been more explicit here: the issue is with PostgreSQL, not LLVM. The JIT compiler has to inject direct memory addresses, making the generated code specific to your query and process.
Interesting, because we store relocatable objects. And process symbols can be resolved by name if you really want. It might be yet another performance trade-off though.
Since PG uses one process per connection and the LLVM JIT code is process specific the code can't be shared amongst all connections to the DB.

Plans themselves suffer from this since they are in memory data structures not designed to be shared amongst different processes.

DB's like MSSQL don't have this issue since they use a single process with threads which is also why it can handle more concurrent connections without an external pooler. Although MSSQL can also serialize plans to a non process specific representation and store them in the DB for things like plan locking.

> Plans themselves suffer from this since they are in memory data structures not designed to be shared amongst different processes.

Oracle uses a process-per-connection model as well (at least on Linux), and they are able to share execution plans across connections. They put all the plans into the "global" shared memory.

Looks like you can change that with THREADED_EXECUTION to make it act like it does on Windows with a single process and threads:

>On UNIX, starting with Oracle Database 12c Release 2 (12.2), Oracle Database can use an operating system process or an operating system thread to implement each background task such as database writer (DBW0), log writer (LGWR), shared server process dispatchers, and shared servers.

The use of operating system threads instead of processes allow resource sharing and reduce resource consumption.

On Windows, each background process is implemented as a thread inside a single, large process.

https://docs.oracle.com/en/database/oracle/oracle-database/1...

Processes in Windows are much more expensive than Unix typically so using threads has always been preferred to multi process, perhaps thats why MSSQL only has that option with an almost fully recreated internal process model that you can list and kill etc.

Even Oracle says it helps with resource usage, even on Unix/Linux. Also looks like Oracle has had some kind shared mode for a long time where it basically has a built in pooler to keep actual OS process count down, not 1:1 like PG.

Sharing plans can obviously be done using shared memory but it's not a simple as just creating some C++ object model (which I believe is what PG has internally) for the plan it must have a process agnostic data format that is then executed probably by deserializing into a executable model from shared memory. Fully jitted code is even trickier vs just a set of logical plan operations. With threads you just share executable code.

Way back on Oracle 9i, we had a mystery stall problem. We couldn’t saturate the network, the CPU, or the fiber channel links. We were stuck at ~50% and stumped. Some fuckery was going on and we had to call in a professional.

Turned out 9i could only run queries that currently resided in the query cache, and some idiot (who was now my boss) had fucked up our query builder code so that we were getting too many unique queries. Not enough bind variables.

So it’s clear Oracle was using a shared cache back then, but like other people here, I’m scratching my head how this would work with Postgres’s flavor of MVCC. Maybe share query plans when the transaction completes?

I feel like that would get you 90% of the way but with some head of queue nastiness.

naive q: could one process perform the compilation, store the output in postgresql shmem, then each pg worker copy that code locally ?

can plans be shared ? I might be mistaken but I thought each worker backend can (e.g.) be assigned to a different partition in a partitioned table, with very different table indexes, statistics, etc.