Hacker News new | ask | show | jobs
by silon7 3972 days ago
To me, a major important thing about NoSql is the predictable behavior (because all queries are simple and there is no "query optimizer").

If I could ensure SQL never using index/table scans and make it fail without proper indexes it would be a major help.

1 comments

How not having a query optimizer makes queries more predictable? It makes them slower for sure, but not more predictable.

Indeed, I find NoSQL very unpredictable. When your query hits and index, the query goes well. When not, you usually end up doing a whole database scan. Plus you usually cannot use more than one index for a given query, and some NoSQL require an index just for ordering....

I want my queries to always behave the same and fail without an index.

(you can't have a reliable system with random behavior where system drops to table scans when statistics are incomplete or something)

And yes, ordering requires an index.

I understand you want to be in "control" of how exactly a query is beeing executed (what indices it uses). There is a lot to be said about query optimization but it is not "random". My guess is that most queries can be optimized automatically and it shows that sometimes indices are not even needed (see https://robots.thoughtbot.com/why-postgres-wont-always-use-a...) whereas the "manual" approach would always use an index.

In PostgreSQL if you want to know what indices are used by a query, just ask the system using an EXPLAIN ANALYZE query and if it does not use any indices, create them (or live with the performance).

Also there are "Index-only scans" which I do not know much about but may well fit your approach: https://wiki.postgresql.org/wiki/Index-only_scans

So to summarize: automatic index selection is not random and slow queries can be identified pretty easy.

> I want my queries to always behave the same and fail without an index.

So I take it you are not using any kind of cache in your memory hierarchy?

This exactly where I observe MSSQL dropping to table scans too soon. If indexes are cached (or when using a ssd), they will be much faster for many many records than a table scan (which will always require disk access in a realistic scenario).

I agree that table scan will be faster if I need a large percentage of the table (more than can be cached).

There are many queries for which using an index is a worse query plan than running them with it. Indexes are not silver-bullet.

Plus having queries to fail if not use an index... seems worse than just having slower (if that would be the case) queries.