Hacker News new | ask | show | jobs
by jandrewrogers 1515 days ago
In a well-designed system, you will typically be limited by effective bandwidth, often memory bandwidth or efficient use thereof which is an area where vectorization can help. Modern servers have tremendous storage bandwidth if you have an I/O scheduler capable of using it. Some newer database engines explicitly reject the assumption that storage throughput is precious as a design constraint, since it has become much less true over time due to advances in hardware.

Use of page layouts highly-optimized for vectorized evaluation is common now even if the implementation isn't vectorized. You lose nothing on modern hardware (they are good layouts regardless) and it allows you to easily do vector optimizations later. As a semantic distinction, columnar and vector layouts are organized differently and optimize for somewhat different things even though they have superficially similar appearance. Classic DSM-style columnar is largely obsolete.

Vectorization, first and foremost, is about optimizing selection operations in a database, but it can provide assists in other areas like joins, sorts, and aggregates. Most queries are a composed from these primitives, so many parts of the query plan may benefit. As a heuristic, operations that GPU databases excel at are the same kinds of operations that benefit from vectorization.

Obviously you can't just throw vectorization at an arbitrary database and expect major benefits, they need to be intentionally designed for it.