Hacker News new | ask | show | jobs
by bryanmig 5515 days ago
The point is that when your table has 4 rows in it, it doesn't matter if you do a full table scan. Wait until you deploy your app and that table now has 4,000,000 rows in it.. then you have a problem.

In MongoDB (and maybe others too) there exists an option to fail when doing a table scan so that you know immediately while developing that there exists a potential for things to go awry at some later point. You can fix that now by either re-thinking your data structure, re-thinking your query, or adding appropriate indexes.

MySQL's slow query log is good for reactive development instead of proactive development.

I think its an indispensable feature.

2 comments

Errr... With a typical RDBMS, doing a full table scan depends on the current stats (e.g., number of rows, are the stats current, state of the index, possibility to use an index, presence of cached data, previous query performance etc.). It does not depend only on the query and the data structure, so it is near impossible at development time to be certain that a table scan will occur during production. In fact, a table scan might occur today and not tomorrow for the same query, but with different data.

Many developers have been burned by trying to optimize their queries and indexes too early. The query planner, although sometimes unpredictable, is generally better than developers at predicting the performance of a particular execution plan. Table scans are also preferable in certain situations.

Relying on the fact that RDBMS have sophisticated query planner (as opposed to MongoDB) is not a sign of "reactive development". It just means that you should test your queries (and your indexes) with close-to-production data, and, as the data grows and evolves, continue to test, because the best execution plan might change.

Slower query using indexes is better than randomly using table scans depending on statistics, ...

At least for interactive apps, NoSQL wins.

There are cases where a full table scan is faster than an index scan, and there are cases where the index scan is faster. There are even cases where the only option is a full table scan!

I still don't see what a setting that gives errors when a particular access method has been chosen would be good for.

You don't care about the plan, you care about how fast the query runs.

You'll always have to do reactive development when trying to get good performance. There is only so much simulating and testing you can do and real data volume and distribution changes all the time.