| HN Mirror

Y	Hacker News new \| ask \| show \| jobs


	by sourcegrift 112 days ago
	We have everything optimized, and yet somehow DB queries need to be "interpreted" at runtime. There's no reason for DB queries to not be precompiled.

6 comments

jpfr 112 days ago

The "byte-code" coming from the query planner typically only has a handful of steps in a linear sequence. Joins, filters, and such. But the individual steps can be very costly.

So there is not much to gain from JITing the query plan execution only.

JITing begins to make more sense, when the individual query plan steps (join, filter, ...) themselves be specialized/recompiled/improved/merged by knowing the context of the query plan.

catlifeonmars 112 days ago

This is a neat idea. I want to take it further and precompile the entire DBMS binary for a specific schema.

menaerus 112 days ago

Someone is already working on it: https://arxiv.org/pdf/2603.02081

catlifeonmars 111 days ago

That looks interesting but it seems inefficient to put an LLM directly into the compilation pipeline, not to mention that it introduces nondeterministic behavior.

menaerus 110 days ago

It has different limitations but inefficiency doesn't seem likely to be one of them. Did you read the Experimental Results section?

> Figure 2 shows the experimental results, and GenDB outperforms all baselines on every query in both benchmarks. On TPC-H, GenDB achieves a total execution time of 214 ms across five representative queries.

> This result is 2.8× faster than DuckDB (594 ms) and Umbra (590 ms), which are the two fastest baselines, and 11.2× faster than ClickHouse.

> On SEC-EDGAR, GenDB achieves 328 ms, which is 5.0× faster than DuckDB and 3.9× faster than Umbra.

> The performance gap increases with query complexity. For example, on TPC-H Q9, which is a five-way join with a LIKE filter, GenDB completes in 38 ms, which is 6.1× faster than DuckDB. GenDB uses iterative optimization with early stopping criteria.

> On TPC-H, Q6 reaches a near-optimal time of 17 ms at iteration 0 with zone-map pruning and a branchless scan, and does not require further optimization. In contrast, Q18 starts at 12,147 ms and decreases to 74 ms by iteration 1, which is a 163× improvement. This gain comes from replacing a cache-thrashing hash aggregation with an index-aware sequential scan.

> On SEC-EDGAR, Q4 decreases from 1,410 ms to 106 ms over three iterations, which is a 13.3× improvement, and Q6 decreases from 1,121 ms to 88 ms over four iterations, which is a 12.7× improvement. In Q6, the optimizer gradually fuses scan, compact, and merge operations into a single OpenMP parallel region, which removes three thread-spawn overheads. By iteration 1, GenDB already outperforms all baselines

vladich 110 days ago

That's all great, but sadly impractical. I looked at one of the first statements: > GenDB is an LLM-powered agentic system that decomposes the complex end-to-end query processing and optimization task into a sequence of smaller and well-defined steps, where each step is handled by a dedicated LLM agent.

And knowing typical LLM latency, it's outside of the realm of OLTP and probably even OLAP. You can't wait tens of seconds to minutes until LLM generates you some optimal code that you then compile and execute.

menaerus 110 days ago

No, that's not how I believe they intended it to work. They generate the workload-specific engine up-front and not when the query arrives.

WJW 112 days ago

How will you handle ALTER TABLE queries without downtime?

catlifeonmars 112 days ago

That would definitely present a bit of a challenge, but:

- not all databases need migrations (or migrations without downtime)

- alternatively, ship the migrations as part of the binary

Adhoc modifications would still be more difficult but tbh that’s not necessarily a bug

Asm2D 112 days ago

Many SQL engines have JIT compilers.

The problems related to PostgreSQL are pretty much all described here. It's very difficult to do low-latency queries if you cannot cache the compiled code and do it over and over again. And once your JIT is slow you need a logic to decide whether to interpret or compile.

I think it would be the best to start interpreting the query and start compilation in another thread, and once the compilation is finished and interpreter still running, stop the interpreter and run the JIT compiled code. This would give you the best latency, because there would be no waiting for JIT compiler.

aengelke 112 days ago

> It's very difficult to do low-latency queries if you cannot cache the compiled code

This is not too difficult, it just requires a different execution style. Salesforce's Hyper for example very heavily relies on JIT compilation, as does Umbra [1], which some people regard as one of the fastest databases right now. Umbra doesn't cache any IR or compiled code and still has an extremely low start-up latency; an interpreter exists but is practically never used.

Postgres is very robust and very powerful, but simply not designed for fast execution of queries.

Disclosure: I work in the group that develops Umbra.

[1]: https://umbra-db.com/

Asm2D 111 days ago

If I recall research papers regarding Umbra it's also using AsmJit as a JIT backend, which means that theoretically the compilation times would be comparable if you only consider code emitting overhead.

The problem will always be queries where the compilation is orders of magnitude more expensive than the query itself. I can imagine indexed lookup of 1 or few entries, etc... Accessing indexed entries like these are very well optimized by SQL query engines and possibly make no sense JIT optimizing.

vladich 111 days ago

Interesting... AsmJit is pretty fast for compilation, but about 3x than sljit. The only way I can see how to make it fast enough, in theory (i.e. without slowing down point-lookup queries and such) would be to fuse planning with code generation - i.e. a single pass plan builder + compiler essentially. Not sure if Umbra tries to do that, and AsmJit is not the best choice for it anyway, but with sljit it could be on par with interpreter even for fastest queries I believe. Pretty hard (likely impossible) to implement though, planning is inherently a non-linear process...

Asm2D 111 days ago

Because pg_jitter uses AsmJit's Compiler, which also allocates registers. That's much more work than using hardcoded physical registers in SLJIT case. There is always a cost of such comfort.

I think AsmJit's strength is completeness of its backends as you can emit nice SIMD code with it (like AVX-512). But the performance could be better of course, and that's possible - making it 2x faster would be possible.

vladich 111 days ago

There are other issues with that auto-allocation. I tested all 3 backends on very large queries (hundreds of KBs) per query. Performance of all of them (+LLVM, but -sljit) was abysmal - the compiler overhead was in seconds to tens(!) of seconds. They have some non-linear components in their optimization algorithms. While sljit was scaling linearly and almost as fast as for smaller queries. So yes, it gives higher run-time performance but the cost of that performance grows non-linearly with code size and complexity. While you still can have good performance with manual allocations. I also don't believe you can make AsmJit 2x faster without sacrificing that auto-allocation algorithm.

vladich 111 days ago

SLJIT is a bit smarter than just to use hardcoded registers. It's multi-platform anyway, so it uses registers when they are available on the target platform, if not it will use memory, that's why performance can differ between Windows and Linux on x64 for example - different number of available registers.

vladich 111 days ago

Good point about SIMD opportunities though - it's something other 2 JITs lack.

aengelke 108 days ago

I'm a bit late, but: Umbra doesn't use AsmJIT anymore since many years, it was too slow.

chrisaycock 112 days ago

> I think it would be the best to start interpreting the query and start compilation in another thread

This technique is known as a "tiered JIT". It's how production virtual machines operate for high-level languages like JavaScript.

There can be many tiers, like an interpreter, baseline compiler, optimizing compiler, etc. The runtime switches into the faster tier once it becomes ready.

More info for the interested:

https://ieeexplore.ieee.org/document/10444855

hinkley 112 days ago

It’s also common for JITs to sprout a tier and shed a tier over time, as the last and first tiers shift in cost/benefit. If the first tier works better you delay the other tiers. If the last tier gets faster (in run time or code optimization) you engage it sooner, or strip the middle tier entirely and hand half that budget to the last tier.

Asm2D 111 days ago

I write JITs so I know, but I always try to write in a way that even non-JIT people can understand :)

vladich 111 days ago

The idea with parallel compilation is interesting. Worth considering, in some cases. The only problem with it is the same as too much parallelization - you can exhaust your CPU resources much faster. But with some sort of smart scheduling it should work. I'll think about it, thanks!

SigmundA 112 days ago

Postgresql uses a process per connection model and it has no way to serialize a query plan to some form that can be shared between processes, so the time it takes to make the plan including JIT is very important.

Most other DB's cache query plans including jitted code so they are basically precompiled from one request to the next with the same statement.

zaphirplane 112 days ago

What do you mean ? Cause the obvious thing is a shared cache and if there is one thing the writers of a db know it is locking

SigmundA 112 days ago

Sharing executable code between processes it not as easy as sharing data. AFAIK unless somethings changed recently PG shares nothing about plans between process and can't even share a cached plan between session/connections.

llm_nerd 112 days ago

Executable code is literally just data that you mark as executable. It did the JIT code, and the idea that it can't then share it between processes is incomprehensible.

I was actually confused by this submission as it puts so much of an emphasis on initial compilation time, when every DB (apparently except for pgsql) caches that result and shares it/reuses it until invalidation. Invalidation can occur for a wide variety of reasons (data composition changing, age, etc), but still the idea of redoing it on every query, where most DBs see the same queries endlessly, is insane.

SigmundA 112 days ago

No a lot of jitted code has pointers to addresses specific to that process which makes no sense in another process.

To make code shareable between processes takes effort and will have tradeoff in performance since it is not specialized to the process.

If the query plan where at least serializable which is more like a AST then at least that part could be reused and then maybe have jitted code in each processes cached in memory that the plan can reference by some key.

DB's like MSSQL avoid the problem because they run a single OS process with multiple threads instead. This is also why it can handle more connections easily since each connection is not a whole process.

patagurbon 112 days ago

What does specialized to the process mean? Lots of JIT tooling these days readily supports caching and precompilation. Invalidation is hard but things like reloading global references are hardly intractable problems especially for an org as large as pgsql.

vladich 112 days ago

The emphasis on compilation time there is because the JIT provider that comes with Postgres (LLVM-based) is broken in that particular area. But you're right, JITed code can be cached, if some conditions are met (it's position independent, for one). Not all JIT providers do that, but many do. Caching is on the table, but if your JIT-compilation takes microseconds, caching could be rather a burden in many cases. Still for some cases useful.

_flux 112 days ago

Write the binary to a file, call it `libquery-id1234.so`, and link that to whichever processes that need it?

SigmundA 112 days ago

Might want to take a look at some research like this [1] that goes over the issues:

"This obvious drawback of the current software architecture motivates our work: sharing JIT code caches across applications. During the exploration of this idea, we have encountered several challenges. First of all, most JIT compilers leverage both runtime context and profile information to generate optimized code. The compiled code may be embedded with runtime-specific pointers, simplified through unique class-hierarchy analysis, or inlined recursively. Each of these "improve- ments" can decrease the shareability of JIT compiled code."

Anythings doable here with enough dev time. Would be nice if PG could just serialize the query plan itself maybe just as an SO along with non-process specific executable code that then has to be dynamically linked again in other processes.

1. https://dl.acm.org/doi/10.1145/3276494

vladich 112 days ago

Won't work well if it executes 20k+ queries per second. Filesystem will be a bottleneck among other things.

_flux 112 days ago

You can put more than one function in one file.

hans_castorp 112 days ago

> and it has no way to serialize a query plan to some form that can be shared between processes

https://www.postgresql.org/docs/current/parallel-query.html

"PostgreSQL can devise query plans that can leverage multiple CPUs in order to answer queries faster."

SigmundA 112 days ago

Nothing to do with plan caching, thats just talking about plan execution of parallel operations which is that thread or process based in PG?

If process based then they can send small parts of plan across processes.

hans_castorp 112 days ago

Ah, didn't see the caching part.

Plans for prepared statements are cached though.

SigmundA 112 days ago

Yes if the client manually prepares the statement it will be cached for just that connection because in PG a connection is a process, but it won't survive from one connection to the next even in same process.

Other databases like MSSQL have prepared statements but they are rarely used now days since plan caching based on query text was introduced decades ago.

AlisdairO 112 days ago

Only on a per-connection basis

levkk 112 days ago

See prepared statements.

array_key_first 112 days ago

DB queries do get pre compiled and cached if you use prepared statements. This is why you should always use prepared statements if you can.

kbolino 112 days ago

It is not always necessary to explicitly use prepared statements, though. For example, the pgx library for Go [1] and the psycopg3 library for Python [2] will automatically manage prepared statements for you.

[1]: https://pkg.go.dev/github.com/jackc/pgx/v5#hdr-Prepared_Stat...

[2]: https://www.psycopg.org/psycopg3/docs/advanced/prepare.html