Hacker News new | ask | show | jobs
by nerdymanchild 2998 days ago
Does this actually improve the performance of most queries? Most queries are light on computation and heavy in IO. Seems kind of like a waste of effort but maybe there are people with very complex / compute-heavy queries.
2 comments

> Does this actually improve the performance of most queries?

"most queries" - probably not. By sheer number that's going to be OLTP queries, and there it doesn't help. You need analytics queries that take upwards of 0.2s or such to benefit.

> Most queries are light on computation and heavy in IO.

That's something often said. I think it's definitely wrong today and at least has been for a while. In a lot of workload a good chunk of the hot data set is in memory, and even a single decent SSD can often more saturate a single core.

If you look at analytics benchmarks and real world analytics usage, you'll often see CPU being the bottleneck. Using multiple cores can alleviate that to some degree, but that can imply a need for a bigger hardware / less concurrency. And doesn't come for free. Efficiency is important.

Depends on the workload and the database. Many workloads on modern hardware are more often limited by memory and network bandwidth than I/O per se. (This partly depends on database kernel architecture and presumes a modern design. Some database engines, particularly older designs, may have other bottlenecks.) Also, for scale-out databases you can compile the query once and ship the binary or IR to every relevant node, which amortizes the compilation cost.

For classic OLTP workloads, JIT compilation doesn't carry much benefit because you are (typically) directly targeting a few records. The primary benefit is for constraint matching in page scan operators. I would say the sweet spot is for low-latency operational analytics i.e. queries in mixed workload environments; the average dwell time of a query on a particular page has a big impact. These workloads (basically "real-time analytics") are increasingly popular.

Before I had experience designing systems with JIT compiled queries I worried quite a bit about overhead. The people with expertise that assured me the overhead would be pretty small if implemented well turned out to be correct, which allows you to JIT a lot of operations for material benefit.