|
|
|
|
|
by ulrikrasmussen
268 days ago
|
|
I am split on SQL. On one hand I love the declarative approach and the fact that I can improve the run-time complexity of my queries just by adding an index and leaving the queries as is. On the other hand, I hate how the run-time complexity of my queries can suddenly go from linear to quadratic if the statistics are not up to date and my query planner misjudges the amount of rows returned by a complex sub-query so it falls back to a sequential scan instead of an index lookup. I would be interested in a query execution language where you are more explicit about the use of indexes and joins, and where a static analysis can calculate the worst-case run time complexity of a given query. Does anyone know if something like that exists? |
|
I like the "plumbing on the outside" approach of databases like ClickHouse where you have to know the performance characteristics of the query execution engine to design performant schemas and queries.
The challenge here is that query planners are quite good at determining the best access path based on statistical information about the data. For example, "id = 1" should probably use a fast index scan because it's a point query with a single row, but "category = 42" might be have a million rows with the value 42 and would be faster with a sequential scan. But the problem doesn't surface until you have that much data. Query planners adapt to scale, while a hard-coded query plan wouldn't. That's not even getting started on things like joins (where join side order matters a lot) and index scans on multiple columns.
I'm not aware of any systems that are like this. Some of the early database systems (ISAM/VSAM, hierarchical databases like CODASYL) were a bit like this, before the relational model allowed a single unified data paradigm to be expressed as a general-purpose query language.