Hacker News new | ask | show | jobs
by Spivak 458 days ago
Finally! Someone else reaching the conclusion that the query planner is really annoying and for most queries I would just like to skip it.

I don't want the dynamic nature of the planner. I don't want to send SQL over the wire, I want to send the already completed plan that I either generated or wrote by hand. So many annoying performance bugs are because the planner did the slow thing. Just let me write/adjust it.

5 comments

Amazon reached the same conclusion and widely prefer dynamodb for this reason
This depends entirely on your use-case though right? It's not generic advice.

If your use-case is a data warehouse, then you absolutely want more than a K/V database and likely dynamic query plans because the point is dynamic usage. If your use-case is the serving frontend for a >1m request per second API, then sure, you probably don't want the complexity of a relational database and query planner.

Most things are somewhere in the middle and need to give serious consideration to this.

The complexity of a relational database is usually very much needed, with all the joins, subqueries, etc.

The poster complains about query plans being unnecessarily dynamic; for certain queries, it should be pinned, and only changed in a controlled way. Compare it to something like pip or npm; not being able to pin versions of certain packages could be a source of endless frustrations.

Pinning a query plan to a query could very well be a feature of a relational DB, an it is. Postgres (pg_hint_plan extension), Oracle (a bunch), MS SQL (somehow), they all have ways to pin the query plan. Not sending SQL is calling a stored procedure, also a long-standing feature of relational databases.

Knowing your tech stack goes a long way in battling frustration.

> Not sending SQL is calling a stored procedure

That's not what I mean, I don't want to bother with the SQL layer at all. I want to generate the query plan from the client side and send it off to be executed.

And to my knowledge the hinting extensions don't actually allow you to skip the ceremony and supply your own plan, just (in a very hacky way) adjust it as it runs.

All I want is K-V store with indexes.

Let database enforce serialization format (JSON, BSON, MessagePack, protobuf.. anything really) + create and maintain indices, using this fancy crash-proof logic it has. That'll cover 95% of all my database needs.

(OP also asks for row-based layout, types, and non-trivial language. I think those parts are entirely optional)

A K-V store by construction has an index (over K). But you seem to also want referential integrity and transactions, not small features, limiting the implementation quite a lot.

You can attain what you desire by using an RDBMS, and having all tables with one key column, and a TEXT column with your serialized non-key fields; it's going to be a fun approximation of 6NF. Realistically, you can have all joinable columns as normal columns, indexed as you desire, and the rest of the columns as a serialized blob.

When you want high parallelism for guaranteed independent segments of data, use sharding.

FoundationDB supports referential integrity and indexes in a layer on top of the key value store.

https://apple.github.io/foundationdb/layer-concept.html

+1.

Indexes, triggers (very good abstraction covering everything from computed fields to dependent fields), transactions.

Use foundationdb then
Is weird, but what the OP was asking for, actually exist before in the case of FoxPro and similar tools.

In Fox, you write more or less `physical query plans` as syntax:

   USE customer  && Opens Customer table
   CLEAR
   SCAN FOR UPPER(country) = 'SWEDEN'
      ? contact, company, city
   ENDSCAN

And what it make this even better, is that you can also write `SQL` so you can have the best of both worlds.

BTW, I think this idea can be move even further and my take is at https://tablam.org

Realistically, what should happen is reusable queries in the database with pre-cached plans as well as planner scripting and eliminating the index chooser for standard transactional queries. For real-time ad-hoc queries, the planners can be used, but for the ones happening 1000s of times a second... best to stick with a cached plan.