| HN Mirror

Y	Hacker News new \| ask \| show \| jobs

by jandrewrogers 2711 days ago

There is a practical engineering reason why the OLAP equivalent doesn't seem to exist. General purpose storage engines, and this applies to RocksDB, are like the C++ STL in that they provide good average performance across a wide range of common cases but are nowhere close to optimal if you have a well-defined type of data model and workload as your use case. You can always gain an integer factor increase in throughput by designing a less generalist implementation with a similar interface.

As with the C++ STL, the limiting factor is the number of tunable parameters available i.e. the amount of internal architectural flexibility built into the implementation. OLTP storage engines are pretty simple, so a manageable number of behavioral parameters can usually get you within 3x of the throughput of a more targeted design, which is acceptable performance for most workloads that are not ingest-intensive.

OLAP-ish storage engines, on the other hand, are at least an order of magnitude more complex to implement and have many more degrees of freedom depending on the expected data model and workload. There is a lot more data model and workload diversity in OLAP than OLTP, which makes implementing the effective internal architectural flexibility and set of tunable parameters that need to be maintained very unwieldy. If you limited yourself to the number of user-definable tuning and configuration parameters as an OLTP-oriented storage engine like RocksDB, the performance gap between a generalist implementation and a more targeted implementation will be more like 10-100x, which needless to say is huge. This makes the practical applicability of any "general purpose" OLAP storage engine that someone would want to use quite narrow, which diminishes the value of implementing a general purpose engine.

This leads to the current reality that there is a zoo of specialist storage engines for OLAP-ish workloads -- graph, time-series, event processing, geospatial, classic DW, etc. Much more generalist OLAP storage engines that do several of these models could exist in theory but the bar for technical sophistication and complexity is much higher than for OLTP.

Open source projects in particular tend to have a natural ceiling on the number of man-years invested to get an initial implementation of an architecture, which inherently limits the expressiveness of that architecture for software with this complexity.