Hacker News new | ask | show | jobs
by ulrikrasmussen 268 days ago
I am split on SQL. On one hand I love the declarative approach and the fact that I can improve the run-time complexity of my queries just by adding an index and leaving the queries as is. On the other hand, I hate how the run-time complexity of my queries can suddenly go from linear to quadratic if the statistics are not up to date and my query planner misjudges the amount of rows returned by a complex sub-query so it falls back to a sequential scan instead of an index lookup.

I would be interested in a query execution language where you are more explicit about the use of indexes and joins, and where a static analysis can calculate the worst-case run time complexity of a given query. Does anyone know if something like that exists?

2 comments

I've had the same thought. I would be interested in a query language which allows me to write a query plan, including explicitly scanning indexes using specific methods.

I like the "plumbing on the outside" approach of databases like ClickHouse where you have to know the performance characteristics of the query execution engine to design performant schemas and queries.

The challenge here is that query planners are quite good at determining the best access path based on statistical information about the data. For example, "id = 1" should probably use a fast index scan because it's a point query with a single row, but "category = 42" might be have a million rows with the value 42 and would be faster with a sequential scan. But the problem doesn't surface until you have that much data. Query planners adapt to scale, while a hard-coded query plan wouldn't. That's not even getting started on things like joins (where join side order matters a lot) and index scans on multiple columns.

I'm not aware of any systems that are like this. Some of the early database systems (ISAM/VSAM, hierarchical databases like CODASYL) were a bit like this, before the relational model allowed a single unified data paradigm to be expressed as a general-purpose query language.

What gets me is when it's faster to build an index and rerun the query from scratch than run the original.
That's less a SQL issue, than a general issue with attempting to optimize based off of heuristics. Generally when "hints" are supported that's what'll happen every time (baring the hint not being executable).

But, remember that a lot of users wanting to query things aren't going to be app developers. They'll be wanting to do a one off query or run some report. You'll tend to need an optimizer no matter what.

That's true, and you probably want both. You want a general purpose query planner for any user-generated one-off queries, but for the hot application code paths where you execute the same fixed queries again and again, you may be interested in stronger guarantees about your run time complexity.